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The Measurement of Supervisory Quality in Industry * 


Quentin W. File 
Purdue University 


In any organization engaged in a productive activity there must be 
individuals who assume responsibility for directing productive procedures 
at the operational level. The Army has its sergeants; the school, its 
classroom teachers; and the church, its ministers and priests. Industry, 
likewise, must rely on its all-important directors of operations, its 
supervisors. Upon these supervisors rests the responsibility of seeing 
that all the creative planning, technical research, and personnel policies 
bear tangible fruit in the form of maintained or improved productivity. 
Operating supervision is the connecting link between top management 
and the worker, and the chain of harmonious industrial relations can 
be no stronger than this-key link. 

In order to establish a common basis of understanding, let us define 
an industrial supervisor as an individual who actually directs the pro- 
ductive processes at the scene of operation. Such a definition would 
include individuals between the group leader and departmental supervisor 
levels. 

To meet the wartime demand, supervisory training programs have 
been instituted both by industry and by the government. Most of the 
training has been carried out on an ‘‘on-the-job” basis so as to provide a 
minimum of interference with regular work activities. In spite of the 
extensive publications describing the various types of training programs 
and testifying to the merits of each, little has been said about the need 
for objectively evaluating the outcomes of such programs or about setting 
up a systematic method for evaluating supervisory quality. 

* This article is based on the author’s thesis of the same title, submitted to the 
Faculty of Purdue University in partial fulfillment of the requirements for the degree of 
Doctor of Philosophy, June, 1944. This study was carried out under the direction of 
Professor H. H. Remiers in collaboration with ten industrial concerns. Funds for 
this research were provided by the Division of Education and Applied Psychology of 
Purdue University. 
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The need for some instrument for measuring supervisory quality 
becomes apparent when one considers the uses to which such a test could 
be put. A good supervisory ability test could be used: (1) to select and 
classify candidates for supervisory training; (2) to evaluate the outcomes 
of supervisory training; (3) for upgrading; (4) to check on the quality of 
present supervisory personnel; (5) as a basis for interviewing and counsel- 
ing supervisors; and (6) as material for group discussion at supervisory 
meetings. 

Management was long prone to assume that to obtain the best super- 
visor one should promote his best worker. In other words, it assumed 
success on a given job to be a reliable measure of the ability of the 
individual to supervise others who do the same kind of work. That 
such is not the case has been repeatedly proved. Ability to deal with 
personalities, loyalty to management, interest in work and numerous 
other factors make it necessary to consider knowledge of a job and other 
specific factors merely an essential part rather than all of the require- 
ments for good supervision. 

The general factors of supervision are many. Each supervisor, 
regardless of rank and experience, must deal with the attitudes of his 
workers, his associates, and his bosses. He must administer company 
policies. He must decide what explanations of wage differentials, vaca- 
tion preferences, penalties, and the like must be given. And, most of all, 
he must be sensitive to potential disunity and dissatisfaction among his 
workers in order that the difficulty can be corrected before it reaches a 
point where production will be affected. 


Construction of the Test 


In the construction of an instrument for measuring supervisory 
quality, consideration must be given to the relative importance of both 
the general and the specific factors involved. The seemingly predomi- 
nant importance of the general factors of supervision is emphasized by 
the large number of books and articles now expressing the need for im- 
proved understanding of human relations and the importance of per- 
sonalities in achieving industrial harmony. The following statement by 
Dodd and Rice’ is illustrative of present personnel trends: ‘““When it 
becomes necessary, new supervisors are selected from the ranks of 
workers, engineers, technicians, and other sources. Experience proves 
conclusively that intelligence, personality, vitality, and leadership should 
outweigh technical or trade ability when the selections are made.” 

Most industrial supervisors are obtained by some form of upgrading. 
It seems quite possible, therefore, that any individual, who is able to 


1 Dodd, A. E.,; and Rice, H. O., eds., How'to train workers for war industries, p. 80. 
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qualify for a supervisory position on the basis of his general abilities, will 
either have acquired, or can acquire, the specific knowledge necessary for 
handling the job. It was on the basis of the hypothesis that factors 
generally common to industrial supervisory positions are the really important 
quantities that this project of constructing a valid measure of supervisory 
quality was conceived. 

When developing any test, it is necessary to make certain basic 
assumptions. The principal assumptions of this study were: 

1. That ability to supervise workers is something general in nature 
rather than highly specific to a given job or company. The supervisor’s 
effectiveness is, in the long run, dependent upon his understanding of 
and ability to deal with human relations. 

2. That lack of this general ability to deal with workers is the greatest 
single cause of supervisory failures and of management-worker friction. 

3. That knowledge of how to handle the supervisory function can be 
tested by obtaining responses to certain significant questions which are 
drawn directly from problems which frequently confront the supervisor. 

4. That such questions can be obtained by direct contact with 
supervisors on the job, by careful study of the literature concerning 
supervisory fundamentals and supervisory problems, by taking into 
account the relevant principles of psychology, and by systematically 
“weeding out’’ those items which prove unfruitful. 

In selecting the items for the supervisory ability test, How Supervise?,? 
three definite objectives were kept in mind. 

1. The items must be presented in problem form calling for an 
operational response, i.e., the items should ask ‘‘What should be done 
... 2?” or “Is it desirable to . . . ?”, ete. 

2. The items must have “face’’ as well as statistical validity. They 
must present problems which are pertinent to industrial supervisors 
regardless of the department or the company from which the supervisors 
are selected. 

3. These items must be simply worded so that any supervisor can see 
the problem involved. 


Item Selection 


Items for How Supervise? were selected from three distinct sources: 
publications concerning industrial supervision, suggestions from in- 
dustrial supervisors and personnel men, and contacts with labor leaders. 
The most fruitful and readily available source of potential items was the 
industrial literature. Industrial supervisory problems have received 


*Sample copies of this test may be obtained from the Psychological Corporation, 
522 Fifth Ave., New York City. 
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considerable attention in the last half decade and much has been written 
about the importance of harmonious supervisor-worker relations. Es- 
pecially valuable of the published works were the human relations 
manuals and books dealing with specific supervisory problems and their 
solutions. Contacts with various supervisors in a sizable manufacturing 
concern offered a means of checking on the practicality of the problems 
presented and of obtaining additional items. 

From the sources mentioned above a pool of 204 items was gathered 
for the experimental edition of How Supervise?. These items were divided 
into two numerically equal forms. Where items dealt with closely 
related problems, those items were placed in different forms. All other 
items were divided on an odd-even basis. Many of the items deal with 
problems now under heated discussion. These items were purposely 
included in order to give the test added value as a basis for supervisory 
conferences. 

The personal data which the supervisor was asked to provide were 
purposely made extensive in order to investigate possible relationships 
of these data with test scores and with management’s ratings. This 
section was purposely placed at the end of the test for three reasons. 

1. The supervisor will be more likely to give his undivided attention 
to oral instructions at the beginning of the test if not presented with 
preliminary material to be filled out before answering the test items. 

2. Any resentment which might occur from a fairly extensive set of 
questions will not affect his responses to the items of the test. 

3. Since tests are usually scored from front to back, the total score 
and the name of the supervisor are brought together with a minimum 
of page turning. This advantage was also evident when responses to the 
items were punched on tabulating machine cards. 


Item Validity 


The validity of a test item must always be described as validity with 
respect to some standard of value. One of the most vital, and usually 
the most difficult, problems of test construction is that of securing an 
adequate criterion. One criterion for a test of supervisory quality is 
obviously ‘“‘success on the job.” Success, however, must be defined in 
terms of some standard and by some individual or group of individuals. 

The problem also arises as to what are the best answers to the items 
of a test. To meet the above problem two assumptions were made: 

1. “Good” supervisors as a group know the best answers. 

* Gardiner, Glenn L., Better Foremanship, First Edition. New York: McGraw Hill 


Book Co., Ine., 1936. Heyel, Carl\Human Relations Manual for Executives, New York: 
McGraw Hill Book Co., Inc., 1939. 
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2. Men who write books and articles on industrial supervision and 
men actually engaged in directing supervisory training programs asa 
group know the correct answers. 

Unfortunately, simply to assume that the best supervisors know the 
correct answers, though logically sound, does not provide an adequate 
criterion. The really crucial problem becomes one of determining who 
these good supervisors are. The writer felt that two groups of people 
should know—(1) the men who work under the supervisor and (2) the 
individuals to whom the supervisor is responsible: in other words, his 
bosses. 

Ratings by members of management above the supervisor proved 
more available than those of the workers. Out of a total of 972 super- 
visors tested 577 sets of four ratings were obtained. The instrument 
used for rating was The Purdue Rating Scale for Supervisors constructed 
by H. H. Remmers and the writer for use on this study. This scale 
asked for an evaluation of the supervisor in terms of seven factors and an 
overall evaluation of his quality when all factors were considered. 

The orthodox method of test validation consists of having so-called 
“experts’”’ answer the items of the test and using their responses as a key 
for obtaining a total score for all individuals tested. This total score is 
then used as a criterion for determining the degree of discrimination 
possessed by each item. 

Foremost of the problems in obtaining expert judgments is the 
problem of determining who the experts are. Two groups of individuals 
were sampled to get the expert judgments needed to provide a scoring 
key. Group One of the sample of experts consisted of eight individuals 
who had either written articles or books about industrial problems, or 
were recognized authorities in the field of mental hygiene. To insure 
careful consideration of the test, a check for $10 was enclosed with each 
set of materials sent out. 

Group Two of the experts consisted of thirty-seven individuais work- 
ing for the government under the Division of Vocational Training for 
War Production Workers. Twenty-four different states were repre- 
sented with no more than five individuals from any state. No financial 
reimbursement was given any member of this group and all responses 
were on a purely voluntary basis. Both groups of experts were asked to 
criticise each item and offer suggestions for improvement as well as to 
give the answers they considered best. 

All scoring of the supervisors’ responses was done from tabulator 
cards. A scoring key was obtained by finding the responses most 
frequently judged best by the two groups of experts combined. Com- 
bining the responses of the paid experts with those of the Training 
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Within Industry group was justified on the basis of a correlation of 
+ .91 + .01 between the modal responses of each group. Items about 
which the experts were unable to agree and those on which the modal 
responses were ‘“‘uncertain” were not used in obtaining the total test score 
for the supervisors. 

The average supervisor tested for this study was thirty-four years 
old, married, with one or two dependents, and probably didn’t own his 
home. He had a high school education and had taken a supervisory 
training course. He had worked nearly ten years before becoming a 
supervisor and over six years since becoming one. He was in charge of 
forty-nine workers of both sexes. In the last ten years he had been 
employed by two different companies. 

Further analysis revealed that: only 2% of his fellow supervisors were 
women; 15% of all supervisors had worked less than two years before 
being promoted and that 38% had been supervisors less than two years; 
only 5% were single while almost 20% had no dependents; 45% owned 
or were buying their homes; 23% had some college training and two- 
thirds of the group had completed supervisory training courses; about 
30% had been with their present employer for at least ten years or else 
had never worked anywhere else; and 73% supervised both men and 
women while only 2% supervised women alone. 

These supervisors were drawn from ten industrial concerns. The 
number of supervisors contributed by each concern was a function of the 
size and organization of that company. 

Tests of the discriminating power of each item of the experimental 
edition of How Supervise? were made with respect to both of the criteria 
previously described, namely, managements’ ratings and total scores on 
the test. The method used to determine each item’s discriminating 
power was the critical ratio of the difference between the average re- 
sponses of the upper 27% and the lower 27% of the supervisors with 
respect to the criterion. This method was favored because it involved 
no assumptions as to the right and wrong answers to individual items. 
To the extent that a given item yields significant differences with respect 
to an acceptable criterion, the item can be considered valid and its 
correct answer will be indicated by the direction of the difference between 
the upper and lower groups. Thus, as was the case with this study, 
when a fallible criterion is used, the validity coefficient obtained for a 
given item depends only on the item’s value and the general validity of 
all of the experts’ judgments. Weakness of the experts’ responses to 
any one item does not make it impossible for the item to show a significant 
discrimination ratio. 
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When selecting items to be retained for the final forms of How Super- 
vise? four different factors were considered: 

1. Size of the critical ratios of the differences between the upper and 
lower groups with respect to total score on the test. 

2. Size of the critical ratios with respect to management’s ratings. 

3. Degree of discrimination throughout the continuum of possible 
answers, i.e., difficulty of the item. 

4. Company to company variation on the item. 

Most important of the above factors used to determine the value of 
the items were the critical ratios of the differences with respect to total 
score on How Supervise? This criterion was favored because of the high 
agreement between different groups of experts and significant differences 
between the upper and lower groups of supervisors. 


Study of the Criterion 


In this study considerable time and statistical attention was given 
to management’s ratings. It was hoped that a criterion could be ob- 
tained which would be independent of the items of the test itself. The 
desirability of securing this independent measure of supervisory quality 
is readily apparent when one reviews the advantages which such a 
criterion offers. 

1. Management ratings are made at the actual scene of operation. 
In addition to being independent of the experimental edition of the test, 
such ratings should constitute a measure of actual success on the job. 

2. Since management is solely responsible for determining what 
individuals will be promoted to supervisory positions, much can be said 
for selecting those individuals who measure up to management’s 
standards. 

3. Management as a group should be more adept at making ratings 
since most modern industrial organizations make use of some form of 
merit rating. 

The 577 supervisors rated for this study were employees of six different 
industrial concerns, ranging in size from 500 workers to 20,000 workers. 
Reliability coefficients and intercorrelations of the rating scale items were 
computed for the entire population tested. In general, the correlation 
between the traits of the rating scale tended to be greater than the reli- 
abilities of the traits correlated. At least 17 of the 28 item intercorrela- 
tions significantly exceeded 1.00 when corrected for the unreliability of 
the items. Since no correlation above 1.00 can exist in practice, it must 
be assumed that corrections for attenuation are not applicable to these 
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the items, but, like all other formulae assuming random sampling, makes 
no allowance for constant or systematic errors. This formula is, there- 
fore, not applicable to data in which these constant errors occur. 

Many authors of statistical texts have assumed that there can be no 
significant correlation between completely unreliable measures of a given 
pair of traits.‘ When corrected correlations above 1.00 were found, 
these correlations were assumed to be due to sampling error. It was 
assumed that further sampling would reveal either lower item inter- 
correlations, higher item reliabilities, or both. 

The writer advances the hypothesis that under conditions where 
excessive halo*® and logical errors exist, these spuriously high inter- 
correlations between traits can be obtained from successive samples of 
randomly selected ratings. The only requirement for such a condition 
is that the logical relation of one trait to another be greater for all raters 
than the reliability of those raters’ judgments. 

Four tests of refined methods of scoring the Management Ratings 
were made. In the data tested all attempts at statistical refinement 
failed to reveal significant increases in the reliability of the ratings. The 
methods tested include weighting by Beta weights, z-scores, and item 
reliabilities. 

Of the 204 items of both forms of the experimental edition of How 
Supervise? 23 items yielded critical ratios of 2.00 or better for the differ- 
ences between the upper and lower groups as selected by management 
ratings. Since less than five items could be expected to occur by chance 
under these conditions, some idea of what management expects of its 
supervisors can possibly be gained by examining the nature of these items. 

Assuming that the item preferences of management-selected super- 
visors do reflect management’s opinions of what constitutes good super- 
vision, the following observations can be made. Management believes: 

1. Its supervisors should accept responsibility for keeping their 
department’s production up and its costs down. 

2. In standardized procedures even to the point of failing to recognize 
the importance of individual differences among workers. 

3. In deeds rather than words. 

4. Its supervisors’ first and foremost responsibility is to management. 

5. Fines are not the best way to discipline workers. 

6. Workers should not be given regular rest periods. 

7. Procedures for granting raises and promotions are management’s 
business, not the workers’. 


‘ Peters, Charles C., and VanVoorhis, Walter R., Statistical procedures and their 
6 Tiffin, Joseph, Industrial psychology. 
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Probably most significant of the findings on the management-rating 
criterion is the absence of any significant critical ratios on items dealing 
with the mental hygiene aspects of supervisor-worker relations. Most 
of these items were highly significant on the total score criterion. One 
may well wonder to what extent the principles of improved personnel 
relations have trickled through to operational levels. 

To investigate the relation of management’s ratings to personal 
information about the supervisors, the correlations between ratings and 
age, marital and home status, education, working experience, special 
training, number of men supervised, and number of companies worked 
for were computed. Only three of these correlations were significantly 
above zero at or above the 5% level of confidence. There appears to be 
a slight tendency for supervisors, with a great deal of working experience 
before being promoted, to receive somewhat higher ratings. Supervisors 
who have taken supervisory training courses are rated slightly higher. 
More significant, however, is the relation between number of men super- 
vised (an indication of rank) and management’s ratings. This relation 
is a natural one since it*is generally assumed that promotions are most — 
frequently given to individuals whom management considered best. 


Validity of Management’s Ratings 


Since the analysis of results on The Purdue Rating Scale for Super- 
visors revealed management’s ratings to be of questionable validity, 
these were used as a secondary criterion for the validation of the test 
items rather than as a principal basis for the selection of discriminating 
items. Reasons for skepticism as to the validity of this criterion are 
listed below. 

1. Unusually large halo effect indicating that only one general factor 
was being measured. 

2. Relatively low reliability of total scores on the ratings. Near- 
significant increases in reliability were obtained where corrections for 
differences in judges were possible. Since this correction could not be 
made on 80% of the ratings, a known source of sizable errors did affect 
the validity of the criterion. 

3. Variations in quality of raters from department to department 
doubtless existed. Such variations would tend to increase the spread 
of the rating scores and thus increase the computed reliabilities. 

4. Almost complete failure to find significant discrimination values 
on supervisory problems recognized by industrial experts as important; 
items which two groups of experts were able to agree upon. 

Several hypotheses can be advanced for the failure of management’s 
ratings to prove their worth as a criterion. The following seemed most 
tenable to this writer; 
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1. The pyramid of authority inherent in industrial organizations 
does not provide the necessary contacts for multiple rating by manage- 
ment. Sound organization, according to this hierarchy, requires that 
each supervisor is responsible to only one individual. Requiring four 
men to rate a supervisor probably necessitates calling in raters who have 
had little contact with the person they rate. 

2. Rating conditions are more difficult to standardize than testing 
conditions. Inherent in these ratings are such factors as personal 
relations between rater and supervisor, personality characteristics of the 
rater, his experience in rating, and company attitude toward merit rating. 

3. Varying standards adopted by raters in a highly skilled department 
as compared to raters in a non-technical division and company-to- 
company variations can account for extremely wide variations in rating 
scores which have no basis in terms of supervisory quality. 

4. Quite possibly either or both management rating and How Super- 
vise? deal with only part of the required abilities necessary for good 
supervision. For example, operational management has long disre- 

garded the area of mental hygiene for workers and the effects of employee 
attitudes. How Supervise? deals primarily with this area. 


Judgment of Experts 


In addition to being asked to provide the best answers to the experi- 
mental edition of How Supervise?, each of the industrial experts was 
asked to mark items which he thought were ambiguous or of no value and 
to indicate any corrections or comments he cared to make. Each expert 
was also asked if he thought the language used in the items would be 
understood by supervisors and if the approach to the problem was a 
practical one. Over 90% of the experts thought the language was 
sufficiently clear, while between 75% and 80% thought the approach 
to the problem practical. 

Two indications of the reliability of the experts’ responses were 
obtained: (1) a correlation of .91 between the modal responses of the 
paid experts and those of the Training Within Industry experts, (2) a 
correlation of .80 was obtained between the experts’ scores on Form A 
and their scores on Form B when scored back on the key derived from 
their modal responses. This would indicate that the reliability of the 
total score criterion was approximately .89. 

Traditional requirements for test validity were well satisfied in this 
experimental study of How Supervise?. Experts in the field in which the 
field in which the test is designed to operate were able to agree as to the 
correct answers to the items. The reliability of the experimental edition 
was found to be .84 + .01 for scores on the two forms of the test combined 











Measurement of Supervisory Quality in Industry 333 


(N = 577). Wide variations in the total scores made by supervisors 
were found ranging from near chance to almost complete agreement with 
the scoring key. 

As would be expected, wide company differences in the average 
quality of supervisors, as measured by total test scores, were found. 
Of the forty-five differences between the average test scores made by 
supervisors of the ten companies tested, 15 were significant at the 10% 
level, 10 at the 5% level, and 7 at the 1% level of confidence. This 
would indicate that some factor or factors were measured by the test 
which exist in varying amounts in different companies. This is es- 
pecially interesting since significant differences were also found between 
the scores of given groups of supervisors at the beginning and end of 
training periods. The significant improvement measured in the latter 
situations indicated that the test was measuring an improvement which 
occurred during this period and that there was an overlapping between 
material covered in the course and the content of How Supervise?. 


Validity of How Supervise? 


Briefly summarized, the experimental indications of the validity of 
How Supervise? are: 

1. Supervisory achievement in industrial training courses has been 
measured and significant improvements found. 

2. Areas which industrial experts consider vital have been reliably 
measured with test items about which the experts agree. Coefficient of 
Reliability = + .84 + .01. 

A-study was made of the relation between total score on How Super- 
vise? and such personal information as marital status, age, education, 
number of men supervised, etc. Several correlations were found which 
were significantly above zero but not of sufficient size to indicate that 
personal data would be of importance in selecting good supervisors. A 
correlation of + .35 between education and total test scores is the only 
relation of sufficient size to be of importance in the selection of supervisors. 
The optimum amount of correlation which should exist between education 
and a test of supervisory quality is problematical. While such a test 
should not correlate highly with amount of education, doubtless, formal 
education does provide valuable learning situations which are generally 
helpful. 

It is interesting to note that the greater proportion of this correlation 
resulted from differences between supervisors who had college training 
and those who did not have college training. For example, 44% 
of the elementary school graduates were above the 50th percentile on the 
overall norms, and 50% of the high school graduates were above this 
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point. At the college level, however, 69% of the supervisors who com- 
pleted one year of college and 74% of the college graduates were above 
this median score for all supervisors. This would seem to indicate that 
selection at the college level tends to ‘“‘weed out’ individuals who have 
failed to develop an understanding of the general factors of human 
relations, or that colleges in the student’s first year of training provide 
considerable opportunity for gaining insight into human relations 
problems. 

No major changes were made which seemed likely to cause the 
supervisor to place a different interpretation on the problem presented. 
Revisions were made only on items whose relation to the total score 
criterion was significant and which comments by either the experts or the 
supervisors tested indicated some confusion in interpretation. 

Items not included in the final form were disqualified for the following 
reasons: lack of discriminating power, too easy, or weak on one or more 
criteria. 

Each form of the final edition of How Supervise? contains 70 items 
which are divided into three categories. Care has been taken to make 
each division of a given form equivalent to its corresponding division 
in the other form. Each division is equated on the following factors: 

1. Variability of the item—standard deviation of all supervisors’ 
responses. 

2. Discrimination index—critical ratio of the difference between the 
mean responses of the upper and lower groups. 

3. Difficulty index—deviation of the average response of all super- 
visors from the correct response. These values were computed on the 
basis of a three-answer continuum. 

4. Number of positive and negative items in each category of each 
form. 

Quite naturally, industrial participation in this experimental program 
was not of a benevolent nature. To insure that the companies, as well 
as we, would receive appreciable benefits, the following reports were sent 
to each cooperating concern. 

1. Scores for each supervisor on the test with percentile values based 
upon norms for all supervisors tested in this study. 

2. Individual reports on the quality of each supervisor as rated by 
four members of management. 

3. Summaries of supervisors’ scores on the experimental edition of 
How Supervise? by departments where a sufficient number of supervisors 
were tested to make such breakdowns meaningful. 

4. An overall summary of the company’s scores on the test with an 
indication of their relative position with respect to other companies 
tested. 
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5. An item-by-item tabulation of the per cent of the company’s 
supervisors who gave each of the possible responses to that test item. 

6. Where forms of the test were given at the beginning and again at 
the end of a training period, comparisons between each supervisor’s 
scores were given, together with an overall evaluation of the program 
as a whole. 


Summary and Conclusions 


Conclusions drawn from this study can best be made in terms of the 
hypotheses advanced when plans for the experimental project were 
conceived. These hypotheses logically fall into three categories: (1) 
those which deal with the nature of industrial supervision, (2) those which 
are concerned with criteria against which supervisory quality can be 
measured, and (3) those which deal with methods of scoring and com- 
puting data. Both the hypotheses and findings concerning them are 
discussed below. 

The hypotheses advanced as to the nature of industrial supervision 
were: 

1. Important aspects of industrial supervisory ability can be measured 
by test items which are equally applicable to all industrial concerns. True. 
140 discriminating items were found in this study; items which showed 
no significant variation with respect to the size or nature of the industrial 
concern. Confidence in the importance of these items was expressed by 
both industrial experts and management. 

2. The mental-hygiene aspects of industrial supervision are of primary 
importance. In other words, supervisor-worker relations are among the 
key determinants of good or poor supervision. ‘True. Several indications 
of the validity of this hypothesis were found. 

a. The average discriminating power of the items of How Supervise? 
which dealt with human relations was significantly greater than the 
average discriminating power of factual items. 

b. In response to a felt need, the last decade has witnessed innumer- 
able publications of books and articles dealing with the human-relations 
aspects of industrial supervision. 

c. Supervisory training courses, which place considerable emphasis 
on this area, are now being given. 

d. The existence of labor troubles, so frequently blamed on conflicting 
personalities, adds further emphasis to the importance of mental hygiene 
in industrial relations. 

3. A general test of supervisory ability can be used to evaluate the out- 
comes of supervisory training programs. True. The experimental edition 
of the test was used by two different companies for this purpose. Sig- 
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nificant gains were found in both cases, especially among the poorer 
supervisors. 

4. Age, education, and miscellaneous other variables are highly important 
factors in good supervision. Generally false. Of all the personal in- 
formation examined, only education revealed a relationship above bare 
significance with respect to total scores on the test. It should be pointed 
out, however, that experience was measured in terms of two-year inter- 
vals. Differences which exist between a supervisor of one and a half 
years of experience and one with no experience at all may well have 
been overlooked. 

The hypotheses advanced concerning criteria for validating the test 
were: 

1. Four members of management can be found who are sufficiently well 
acquainted with any particular supervisor to rate his abilities accurately. 
Questionable. Ratings obtained for this study were not sufficiently 
valid for use as a criterion for determining test item discrimination. 
Differences in standards set by different raters, lack of knowledge about 
the supervisor rated, and logical error (halo effect) concerning relations 
between rating traits all tended to make the obtained ratings invalid. 

2. Industrial experts as a group give reliable answers to the problems 
presented in the test items. True. Two completely different groups of 
experts agreed closely as to the best answers to the items of the test 
(r = + 91). 

3. Top management and industrial experts agree on what constitutes 
good supervision. False. Validity of this hypothesis would have 
eliminated the need for two criteria for the validation of test items. 

The hypotheses advanced concerning different methods of scoring 
both rating-scale and test data were: 

1. Weighted scoring of ratings significantly increases the reliability of 
the total rating scorés. Generally false. The only appreciable increase 
in reliability, which resulted from the previously described weighting 
methods, was that of correcting for the variability of individual judges. 
This increase in reliability was only significant at the 11% level of con- 
fidence, and was not applicable to most of the data. 

2. Test items which provide five possible responses to each item yield 
more reliable measures of supervisory quality than items which provide only 
three possible responses. False. Identical reliabilities were obtained for 
the two types of items. On the basis of this finding, items in the final 
forms of How Supervise? provide for only three possible responses, 
“agree, uncertain, disagree.” 

In addition to the hypotheses accepted or rejected, other observations 
were made for the analysis of the experimental data. Assuming that 
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management-selected supervisors do reflect the attitudes of top manage- 
ment in their responses, the following observations can be made: 

1. Management and industrial experts significantly disagree: 

a. On methods of handling dissatisfied workers. Industrial experts 
favor transfer; management opposes. 

b. On methods of handling complaints. Management favors stand- 
ardized procedures for each type of complaint; the experts favor the 
recognition of individual differences. 

c. As to the desirability of delegating responsibility to workers for 
improving working conditions. Management opposes. 

d. As to the wisdom of allowing regular rest periods. Management 
opposes. 

e. As to whether a worker should be told what promotions he can 
expect providing he attains a certain level of proficiency. Management 
maintains that these matters of salary and promotion are company 
business which should not be disclosed. 

2. Industrial supervisors, selected by management as best, are not 
fully aware of the importance of human-relations problems in industrial 
supervision. Very few of these problems as presented in the test items 
approached significance with respect to the management-ratings criterion. 
The same items were highly significant with respect to the total score 
criterion. 

From the hypotheses investigated and observations made, we may 
conclude that general factors of supervision do exist and that these 
quantities can be measured. The human-relations aspects of supervision 
are vital and are, of necessity, receiving an ever-increasing amount of 
attention from management. Industrial experts, both theoretical and 
practical, have rather clear-cut ideas about these general factors. In- 
dustrial management tends to be less progressive and seems to favor 
keeping the worker ‘“‘in his place,”’ rather than encouraging him to become 
interested in “company affairs.”” Management’s idea of what it wants in 
a good supervisor seems rather inclined toward negative rather than 
constructive methods of handling supervisor-worker relations. Manage- 
ment is, however, well aware of the factual problems in industry and how 
they should be handled. Only on items dealing with the mental-hygiene 
aspects of supervision were there indications of significant weaknesses. 

From this study, a test of the general aspects of supervisory quality 
has been developed. It is believed that this test, How Supervise?, will 
prove valuable for selecting candidates for and evaluating the outcomes 
of supervisory training programs, for selecting individuals for direct 
promotion to supervisory positions, and for checking on the quality of 
present supervisory personnel. 


Received September 23, 1944. 
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It would be presumptuous for me to pretend to a comprehensive 
knowledge of the many and varied methods now being used in the 
selection and placement of personnel in the Armed Services. Many 
groups in both the Army and the Navy have been concerned with these 
problems. The civilian Office of Scientific Research and Development of 
the Federal government and certain quasi- and non-governmental civilian 
agencies have made contributions. It may be permissible at this time, 
however, to review some of the trends and mention some of the recurrent 
problems. 

It must be recognized that those of us who have been continuously 
working on the problem of selection of service personnel suffer at this time 
from lack of perspective. We are too close to the work and therefore 
apt to magnify what will ultimately seem like minor operating difficulties. 
The fact that most of the data and tests are at present “classified” ! 
further limits what can be said of them. Virtually all of the tests, as 
well as the research and the investigational studies, have been financed 
by the government, and all results are therefore subject to strict govern- 
mental control. Government employees and others having access to 
the data are not permitted to report on such results without official 
permission, and no such permission has been sought for any matters 
covered in this paper. 

The importance of the wise use of manpower has probably been 
recognized to a greater extent in this war than at any previous time. 
The examples of waste and extravagant use of manpower which can be 
cited may seem to deny this statement. The cost-plus type of contract 
does not encourage economy in the use of civilian manpower, and certain 
branches of the military and naval departments have at times deemed 
it wise to reserve or hoard desirable men. The fact still holds, however, 
that in this war more than ever before attention has been paid to putting 
the man in the job for which he is best suited, and assigning to special 
training only such men as could absorb the training in the time ailowed. 

* An address delivered at the Cleveland meeting of the American Statistical Associa- 
tion, September 13, 1944. 

1 The term “classified,” as applied to government documents or data, signifies that 
the material is either “restricted,’”’ “‘confidential,” or “‘secret’’—i.e., available only to a 
small number of specified individuals. 
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Random selection, which was not unknown in the services—for 
example, selecting every other man, or the first ten or twenty men on the 
muster roll—has been replaced by methods clearly superior. Some of 
the more satisfactory devices are costly both in time and in dollars—for 
example, assigning any individual at random to try his hand at a complex 
task and retaining him only after he has clearly demonstrated his ability. 
Such methods are being replaced by more efficient and economical means 
of selection which yield demonstrable savings. 

The increase in use of appropriate selection devices is a relative matter. 
There are still to be found able and experienced members of our armed 
forces who believe that, in selecting the fighting man, only the imponder- 
able personal characteristics are of importance and such variables can 
quickly be recognized—not measured—by only a few men steeped in the 
tradition of the particular branch of service concerned. Even such die- 
hards, however, are gradually coming to recognize that through scientific 
selection—including test scores, an evaluation of background and train- 
ing, an estimate of certain personality traits, and a recognition of interests 
—men are now being trained better and faster. And fast technical 
training, no one will deny, has been of the essence. The fighting men of 
today must be technicians who are well trained in the operation and 
maintenance of complex instruments. 

The most striking evidence that the importance of modern methods 
of selection and placement is being recognized is found in the number of 
psychologists employed in the task. The work in the office of the 
Adjutant General, in the Air Surgeon’s office, in the Medical Research 
group in Naval aviation, in the Bureau of Naval Personnel, in the Armed 
Services Institute, in the projects under the Committee on Service 
Personnel of the National Research Council and its successor, the 
Applied Psychology Panel of the National Defense Research Committee, 
and in the selection and placement of men in the Army and Navy college 
programs—all these attest the increasing role being played by measure- 
ment work in the selection and placement of men. Dollars are so freely 
spent these days that cost figures are less significant, but should someone 
have the time and authority to calculate the cost of the technical and 
developmental work being done by the Army and the Navy in selection 
procedures, the figure would be most impressive. The savings which the 
use of the techniques so developed have made possible would be even 
more staggering. It must never be forgotten that no weapon is better 
than the man behind it—and that modern weapons make heavy demands 
on training, and skill, and discretion. 

There are inherent difficulties in finding desirable and economical 
methods for the selection of men destined for success in a war activity. 
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Seldom can all variables be controlled experimentally. Laboratory 
set-ups are frequently not feasible. In field checks, a multitude of 
basically irrelevant factors operate to reduce correlation. The require- 
ments for a given school are frequently shifted for reasons not always 
evident to the research psychologist. Sudden demands from the Fleet, 
for example, for more men trained in a particular school may force a 
lowering of the standards for the men taken into the school if, as is usually 
true, the supply of manpower is limited in quality as well as quantity. 
The many difficulties characteristic of field study are augmented by the 
not infrequent and drastic shifts characteristic of the Army and the Navy. 
A radio code school unexpectedly closed down may leave unfinished a 
lengthy and costly experiment. The supply of available men may 
suddenly be changed, rendering a change in standards both desirable 
and necessary. 

There is seldom a clear-cut and unquestioned final criterion against 
which to validate selection procedures. The criterion of successful 
performance under combat conditions of the duty for which the man is 
trained is seldom obtainable. Furthermore, combat conditions are not 
stable. The combat situation is not the same for all men even in a given 
type of work and in the same unit. The best sort of criterion from the 
combat field is usually a rating or estimate by superior officers; and only 
in rare cases can such ratings, with all their weaknesses, be obtained. 
Because this criterion is rarely available and can seldom be obtained in 
any satisfactory fashion, we must accept intermediate criteria. As a 
matter of fact, so called intermediate or non-combat criteria are entirely 
suitable for many tasks. Weeks and months of preparation are necessary 
for a brief period of actual combat. Many service men must regularly 
engage in duties removed from combat. These prosaic day-to-day jobs 
of the technical sergeant, the yeoman, the storekeeper, the radioman, and 
many others are of great importance in making it possible for us to win 
battles. The skill and zeal they show in their non-combat duties should 
not be underestimated. 

Most technical jobs in the services are restricted to men who have 
graduated from formal courses usually called schools. The length of a 
course varies from a few days to several months. To be rated as a 
torpedoman in the Navy, for example, a new recruit must, after general 
preliminary training, go to a school for torpedomen. To obtain the 
coveted wings of the Naval aviator, one must graduate from the extensive 
aviation training courses. Any economy which can be effected in select- 
ing men who will do well in these schools is obviously worth while. 
Most selection procedures are designed to pick men who have a high 
probability of success in the school concerned. If scores on a valid 
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performance test can be obtained for the criterion, so much the better. 
School grades may reflect many traits of the instructor as well as the 
students, and so are frequently less useful. If the skill of the bombardier 
is measured by his hits on a target in fifty standardized practice runs, an 
unusually good criterion is available against which to validate the selec- 
tion and training procedures. Valid performance criteria are only rarely 
obtainable. : 

If the schools in certain cases have been somewhat out of date, if they 
fail to achieve proper motivation, if they eliminate men for reasons of 
discipline or personality, the problems of selection are complicated. If 
the schools change and improve as rapidly as possible, as they certainly 
should, the nature of the tests and scores used in selection must be 
subjected to constant study and revision. Selection and training cannot 
be separated; they must be dealt with in most cases as a unit. 

The first logical step in evolving a program for selection of men to 
enter a certain technical course or school is to make an analysis of the 
school curriculum. This analysis is usually made informally and some- 
times intuitively. Tests are then selected which are believed to measure 
the traits deemed essential for success in the school. In selecting the 
men to enter the school, the classification officer should evaluate past 
training and experience along with the test scores. Grades in the school, 
graduation from the school, and ratings obtained are frequently used as 
criteria for the validity of the tests. Many different tests—the verbal 
factor, mathematical aptitude and knowledge, spatial tests, tests of 
general aptitude for electronics, mechanical aptitude, etc.—have been 
given, and various ways of using the results are available. Conditions 
within the schools change, as has been pointed out. Thus there are 
many difficulties in. the way of establishing permanent or final methods 
of selection, but the immediate gains made through the use of recom- 
mended procedures have again and again been demonstrated. 

Tests and selection procedures which are effective in selecting men 
from one type of population will not necessarily be equally effective in 
picking men from another type of population. If only college graduates 
have been considered for a particular school, a new selection technique 
may be necessary when men with only a high school education are made 
eligible. A selection program suitable for older men with considerable 
trade experience will not necessarily work equally well with men just 
out of school. As the nature of the pool of men from whom selection is 
made changes in its essential characteristics, the testing program must be 
revalidated. For an accurate and complete interpretation of a test score, 
it is usually necessary to know something about the characteristics of the 
population being tested. 
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An illustration may be taken from the Army-Navy College Qualifying 
Test. An item which is successful on this test (in the sense of predicting 
total score on the section) for high school seniors from large urban com- 
munities in Iowa and Nebraska will not necessarily be equally successful 
for seniors from small rural locations in New York state. Consider an 
item in the section on so called common-sense physics, which supposedly 
tests for knowledge frequently obtained from other sources than the 
class room. One item concerned the long distance transmission of 
electric power. This item was approximately of equal difficulty for the 
two groups—rural and urban—but much more valid for the group from 
rural New York state. Even in the antonym type of items, significant 
differences are found in certain cases—and these items cannot be guessed 
in advance. Why should domineering as the opposite of servile be easier 
and more valid for New York city seniors than those from rural Alabama 
and Georgia? The determination of the angle between the hands of a 
clock at 4:10 is more valid for seniors from urban centers in California 
than those from urban centers in New York. 

These populations of high school seniors from different areas of the 
country and from rural and urban environments are similar in many ways. 
The shifts in the inductee population with changes in draft regulations 
and with the exhaustion of certain types of eligible men are much greater, 
and cannot be ignored in interpreting test results. 

There are many techniques now available to improve the efficiency 
of tests and selection procedures. Some method of item analysis has 
been widely used by many of the technicians. Through the use of some 
index such as the biserial correlation coefficient, or some estimate of it, 
or certain empirical indices, an analysis of the behavior of a population 
on the individual item can be determined. Non-contributing items can 
thus be eliminated and efficient items retained. The difficulty of the 
item for the population can at the same time be determined. The 
criterion for the analysis usually is the score on the total test, although 
an acceptable external criterion may be even better if it is available. It 
is to be hoped that the comprehensive study of the interpretation of item 
analysis which has been done for the services will eventually be made 
available for wider distribution. 

In measuring the success of a test for selection, the simple or multiple 
correlation is frequently used, with the test scores on the several measures 
serving as the “independent” variables, and the school grades or success 
on some performance measure or ratings by superiors as the criterion 
being predicted. In those cases where the tests are used to eliminate the 
potentially unsatisfactory rather than to predict the degree of success, 
critical or “cutting” scores can be used. In such cases the empirical 
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results are shown in a four-fold table of pass or fail on the test, and success 
or failure in the school. The assumptions underlying the cutting-score 
procedures are more simple and direct than are those where the tests are 
used to predict the entire range from the lowest failure to the top success 
in the criterion; on the other hand, the cutting-score procedures suffer 
from all the statistical and practical disadvantages of coarse grouping. 

Certain personality scales and psychoneurotic inventories have been 
used to select men for special attention by the psychiatrist, or to eliminate 
men from further consideration for certain tasks believed to demand 
stable types of personality, such as submarine crews or paratroopers or 
men for certain branches of the intelligence service. In these as in many 
other cases, the selection cost ? is an item of importance. Too frequently 
the four-fold table of results presents percentages only, without including 
the original raw frequencies. The following example serves to illustrate 
the importance of this omission. Suppose that, in an experimental 
population of 1,000 cases, a cutting-score is established which predicts 
failure for 60% of the men subsequently rejected, at a cost of only 10% 
of those who are acceptable in terms of the final criterion. If, however, 
only 50 of the 1,000 men were finally eliminated, while 950 were success- 
ful, the test would have cost 95 (10%) of the acceptable men, in the pro- 
cess of detecting 30 (60%) of those who should be eliminated. The 
problem then is to determine how many normals one can afford to 
eliminate, in order to detect a large proportion of the defectives. The 
answer to such a question can of course be determined only in the light 
of other factors, such as the seriousness of allowing defectives into the 
work, and the availability of men for the work in question. Where the 
unsatisfactory men may ruin expensive, complex, and difficult-to-replace 
equipment, or endanger the lives of other men, a large selection cost may 
be justified. In other situations where the manpower supply is tight, 
the selection cost must be reduced. 

In some of the extensive work which has been undertaken by those 
working on selection procedures with the armed services, certain methods 
are being followed which may subsequently be judged to have been less 
than perfect. A few of the possible sources of error may be suggested. 
The tests being developed in many cases are still covering a more hetero- 
geneous field than many feel desirable. Relatively pure tests, i.e. 
homogeneous tests, allow themselves to be subjected to a more rigorous 
interpretation. If we are to believe general reports, we find that the 
Army general classification test contains several quite different types of 


? As here used, “selection cost” refers to the number of men falsely rejected by the 
selection procedure. The smaller the number of acceptable men rejected, the lower the 
selection cost. 
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subject matter, verbal, mathematics, and spatial, and yet only a single 
score is obtained. The Army-Navy College Qualifying Test used as the 
first screening for over half a million men applying for the Army and 
Navy college programs contained three separate sections, verbal, mathe- 
matical, and science, and yet a single total score served as the basis for 
the screening. Other illustrations could be supplied. There are good 
practical and theoretical reasons for these complex tests being used, and 
for only a single score being reported ; but a good case can also be made for 
the use of separate scores for each of the several components. It is 
interesting to note that revisions are now being developed for more pure 
types of measures in the Army basic battery. 

In a few cases the selection programs use too many similar tests. In 
some cases, fifteen or sixteen test scores enter into the final selection. 
Possibly a reduced program of five or six relatively pure measures of 
meaningful complexes would do practically as good a job. Paper and 
pencil tests of the aptitude variety have their limitations, and the use of 
a large number of tests gives no assurance that the traits being measured 
are not few in number. Simpler programs of tests are indicated in some 
cases. 

Another deficiency has been the lack of careful analysis of the results 
from the total testing program used. Such an analysis would reveal the 
weakness just mentioned. Factor analysis, for example, might be 
brought into play to show how many different factors are being measured 
by the tests employed. Factor analysis techniques, while not extensively 
used in the wartime selection-jobs known to the speaker, have been used 
effectively in several cases. Simple reliability analyses will show the 
undependability of certain of the measures with short time-limits still 
being used. 

A serious error occasionally made is the establishment of selection 
techniques on populations not representative of the population for which 
the techniques are subsequently to be used. In many instances it can be 
demonstrated that the shifts in population are of great importance and 
the results of the tests cannot be interpreted without reference to the 
characteristics of the population being tested. 

The acceptance of defective and unanalyzed criteria as the basis for 
the validation of tests and selection procedures constitutes a serious 
source of error. Ratings, service school grades, scores on performance 
trials—in short almost every available criterion measure—should be 
carefully checked and analyzed before it is accepted. How was the 
criterion measure obtained? Of what factors is it composed? Does it 
reflect the abilities and the skills which the tests were designed to measure 
or which are essential for success in the job? As testing programs become 
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more carefully developed, as more homogeneous or pure types of measures 
are used, the final criterion also must be subjected to analysis and 
purification. In many cases it will be found inadequate and will impose 
a limitation on the interpretation of any validity coefficients based on it. 

The poor conditions under which the tests must sometimes be ad- 
ministered also account for unsatisfactory results. At times the tests 
are administered to the men when they are in a frame of mind scarcely 
conducive to obtaining a normal sample of their behavior. For example, 
at one location scheduling complications led to the administration of 
aptitude tests to men immediately following inoculations. In another 
case, men were tested in the evening of their first day of very strenuous 
active work on a new location. Quite regularly, unselected men are being 
tested in groups as large as 500 or 1,000—some experts consider that 
better results could be obtained if groups were smaller. As more and 
more trained testing men take charge, improved procedures can be 
expected. 

In spite of all these difficulties and all the weaknesses in the systems 
being used, results of demonstrable value are being produced. Rela- 
tively simple selection procedures are being shown to be of value in saving 
time and manpower, in putting the men in the jobs for which they have 
aptitude, and in eliminating the unstable and discontented from certain 
types of crucial work. 

In looking toward the future, the growing intricacies of the machinery 
of war suggest that the country would be safer, if, directly in conjunction 
with the developments and techniques, more time were spent on research 
in methods of selection and training personnel. With new machines and 
improved techniques for the selection and training of men in their use, 
we shall be able to hold our own in any future situation. Psychologists 
and statisticians will do well to establish ever more clearly the gains to be 
obtained by simple but thorough selection procedures and the well- 
systematized use of test results in the armed forces. 


Received October 2, 1944. 








Adapting the Minnesota Rate of Manipulation Test 
to Factory Use 


Guy M. Wilson and Staff 


Personnel Testing Department, Raytheon Manufacturing Company, 
Newton, Massachusetts 


“Rate of movement is a unit skill and in and of itself cannot be 
improved. Only the techniques of performance can be improved.” So 
states the author of the Minnesota Rate of Manipulation Test.! 

If this is true, the measurement of an operator’s rate of manipulation 
should reveal valuable information. It should provide a significant 
ranking of operators on the one item, speed of manipulation. 

At one plant * the Minnesota Rate of Manipulation Test was included 
in a battery of tests in which it was sought to measure operators as to: 
(1) Intelligence—three tests, (2) Manipulative skill—three tests, (3) 
Special skill—two or three tests, according to the job. It appeared to 
hold its place as a helpful test under manipulative skill. 

In time, however, some questions arose with reference to how best 
to use the test, and how to record the results. Time is always a factor 
in a production plant. Therefore the question—‘‘Could we save two 
minutes, more or less, by using three trials instead of four?” The 
Manual for the Minnesota Rate of Manipulation Test calls for four trials 
and the final index used is the total time for the four trials. If three 
trials would serve as well, valuable time would be saved. 

As the above question was studied another question arose, viz., 
“Would the low score of four trials or three, serve as well as the sum of 
four trials or three, as an index?” If so, time would be saved in adding 
and a simple, more easily interpreted number could be used as the index. 
For example, the sum of four trials for an individual (see Table 1, which 
follows) might be 235 seconds. For the same individual the low of four 
is 54 seconds. The individual who sees this smaller figure, readily inter- 
prets it. It means, “One trial required 54 seconds.” 

The statistician knows that regardless of the convenience or reason- 
ableness of a change in procedure, the change cannot be made unless 

1 Zeigler, W. A. Manual for Minnesota Rate of Manipulation Test. Educational 
Test Bureau, Minneapolis. 

*? The Raytheon Manufacturing Company, Newton, Massachusetts. 
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statistics justify the change. In the case of these two questions the 
procedure for study was very simple. 

Table 1 shows in column 1, the ordered arrangement of scores made 
by 63 subjects according to “sums of four.’”” Column 2 shows the corre- 
sponding sums of three trials. Column 3 shows the corresponding lows 
of four trials, and column 4 shows the corresponding lows of three trials. 


Table 1 


Various Scores for Each of Sixty-three Factory Workers on the Minnesota 
Rate of Manipulation Test 








(1) (2) (3) (4) (1) (2) (3) (4) 
Sum of Sum of Low of Low of | Sum of Sum of Low of Low of 
Four Three Four Three Four Three Four Three 
Trials Trials Trials Trials Trials Trials Trials Trials 
177 124 40 40 229 174 55 56 
184 137 40 40 229 174 55 56 
192 141 44 44 230 176 54 55 
196 148 48 48 230 175 55 57 
199 149 49 49 231 172 56 56 
200 151 47 47 232 178 54 55 
203 150 49 49 235 176 54 54 
206 152 48 48 235 182 53 58 
209 158 49 49 235 173 54 54 
209 155 50 50 236 178 58 59 
210 158 51 51 239 182 57 60 
211 159 52 53 241 183 58 59 
212 158 52 52 243 181 59 59 
212 160 50 50 243 183 60 61 
216 165 51 53 244 184 59 59 
216 162 53 53 245 182 59 59 
216 163 53 53 246 186 59 59 
216 162 52 52 247 182 60 60 
216 165 51 54 248 185 61 61 
218 165 53 53 251 189 60 60 
218 162 52 52 253 189 62 62 
218 163 52 52 255 192 60 60 
219 166 53 54 255 191 60 60 
220 164 54 54 255 184 59 59 
221 167 54 55 255 194 61 62 
223 171 52 55 256 194 61 61 
224 170 54 55 256 193 63 63 
225 169 55 55 258 197 62 65 
225 170 55 56 261 196 64 64 
226 168 55 55 263 199 64 66 
226 171 55 55 280 219 61 67 

228 169 56 56 
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The present problem is so simple that the mere arrangement in order of 
the items, almost answers the questions raised. There are 15 or 16 
misplacements in column 2, when compared with column 1, but the mis- 
placements are small. 

When column 3 is compared with column 1, the story is almost iden- 
tical. There are 16 or 17 misplacements, all very small. In other words, 
the low of four trials would give almost the same ranks as the sum of 
four trials. And the same applies to column 4, the low of three trials. 

If three trials are as good as four, or approximately so, and if the low 
of the trials is as good as, or better than the sum, then we can move on 
to the low of three trials, as the index to use. It will save time, and it 
will be easily understood. 

Comparing columns 3 and columns 4, the low of three trials and the 
low of four trials, it appears that for 43 of the subjects there is a zero 
difference. For instance, the first case, the low of four is 40 and the low 
of three is 40,—in other words, no difference. In 12 of the pairs the 
difference is one. For instance, the first difference is the twelfth case, 
the low of four is 52, the low of three is 53. There are 12 such pairs as 
indicated above where the difference is one. Thus we have 43 plus 12 
cases in which the difference is zero or one. The other differences are 
as follows: 2 cases with a difference of 2; 4 cases with a difference of 3; 
1 case with a difference of 5; and 1 case with a difference of 6. 

The author of the Minnesota Rate of Manipulation Test does not 
present the data supporting the reasons for the choice of the sum of four 
trials as the proper index. The sum, of course, is equivalent on a ranking 
basis to the average. An average would give a lower figure, and, there- 
fore, a more easily comprehensible index. On theoretical grounds, and 
in the absence of supporting data, it may be easily argued that a low of 
four trials is better than an average of four trials. In the field day event, 
such as the pole vault or the high jump, the best score made is taken, 
not the average. 

It is evident, from a casual study, that the correlation between any 
two columns in Table 1 is very high. The correlation between columns 
3 and 4, for instance, using the product moment formula, is + .97. The 
only negative product in the products column is — 3; there are four 
zeros; the other 58 products are positive. 

Correlating * the other columns of Table 1, gives the following values 
for r—columns 1 and 2, + .986; columns 1 and 3, + .952; columns 1 and 
4, + .968; columns 2 and 3, + .939; columns 2 and 4, + .968. 

It was finally concluded in this particular factory to substitute the 
low of three trials for the sum of four trials as the index of performance 


* Correlations figured by Rachel Lounsbury. 
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for the Minnesota Rate of Manipulation Test. It is more convenient; 
it is more easily understood by some one to whom an explanation of the 
score is being made. It, probably, is an equally good index, although 
the proof of this last statement would require checking by more cases 
than used in this study and correlating with outside criteria. But this 
study is sufficient to raise the question, and, probably, to justify the 
change to a more convenient and understandable index.‘ In a busy 
factory, time and ease of understanding are important factors. 

The above discussion may lead the reader to suspect the use of local 
data for the establishment of local norms. This is correct. First inter- 
pretations were based upon national or published norms. As soon as 
sufficient cases were at hand to fill in what appeared to be a typical 
distribution, local norms were tentatively established. If confirmed by 
later distributions, they were then used with reasonable confidence. 
Constant checking of one’s data is necessary in any case and such check- 
ing sometimes reveals desirable local adaptations. 


Received October 7, 1944. 


* See also Jacob Tuckman: A comparison of norms for the Minnesota Rate of Manip- 
ulation Test. J. Appl. Psych., 28: 121-128, Apr. 1944. 








The Horn Art Aptitude Inventory 


Charles A. Horn and Leo F. Smith 
Rochester Institute of Technology, Rochester, New York 


It is the purpose of this paper to describe the Horn Art Aptitude 
Inventory. This aptitude test has been developed by the faculty of 
the School of Applied Art of the Rochester Institute of Technology ' 
during the past eight years and has been used with freshmen entering the 
Art School at this institution and with groups of high school art students 
competing for Art School Scholarships. 


Construction 


After having studied and experimented with various art tests over a 
period of years, the Art School faculty were of the opinion that certain 
qualities essential to success in the art field were not being satisfactorily 
measured. The problem of obtaining clues to these qualities in students 
was the objective in designing this test. 

The test is divided into two distinct sections: (1) Drawings of Lines 
and Shapes, subdivided into two parts: (A) Scribble Exercise, and (B) 
Doodle Exercise; and (2) Imagery.” 

In section 1 (Part 1A) the Scribble Exercise is designed to give the 
student confidence that he can draw a reasonably simple shape or picture. 
In this part he is asked to draw twenty different items such as a book, a 
fork, etc. and is given a limited time, varying from two to six seconds, 
in which to make each drawing. The total time required for administra- 
tion of this section is approximately five minutes. 

The Doodle Exercise (Part 1B) is designed to obtain examples of the 
student’s quality of lines, ability to follow directions, originality and 
compositional sense. In this part he is asked to draw various lines and 
shapes such as rectangles, triangles, circles, etc. The total time required 
for the administration of this section is approximately five minutes. 

The Imagery Section (Part 2) is designed to obtain an indication of 
the scope of the student’s interests, and the fertility of his imagination 
with respect to the number of ideas and the ability he exhibits in present- 
ing these ideas. In this section there are twelve rectangles 234 inches 

1 Formerly Rochester Athenaeum and Mechanics Institute. 


2 The test and manual of directions is distributed by Educational Research Office, 
Rochester Institute of Technology. 
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by 3) inches in which certain key lines are presented and the student is 
requested to use these lines as “spring boards’’ and construct sketches 
which are suggested to him by the key lines. Figure 1 shows two 
pictures which have been constructed using the key lines given. The 
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Fic. 1. Sample pictures B and C drawn on basis of key lines shown in A. 


total time required for the administration of this section is approximately 
forty minutes. 


Interpretation and Scoring 


In the manual of directions now being prepared for use with this test 
examples of excellent, average, and poor papers are being included. 
This will enable a person not trained in art to have a basis upon which 
to make judgments. There are several standards, however, which the 
members of the Art School faculty have identified as important. These 
are: 

1. Order: Are the items so placed on the sheet that they fill it pleasingly 
and indicate that the student has a sense of order, or has the student 
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cramped the items so that the entire page was not used? In other words 
has the work been well planned for the size sheet which is being used? 

2. Clarity of Thought and Presentation: Are the sketches made with 
a clean line so that there are no erasures or fumbling? Are the items 
recognizable, that is, if the student meant to draw a tree does the drawing 
reasonably resemble a tree? What is the quality of the line used? Is 
it continuous? Is it broken, cramped, or bumpy or is it graceful and 
smooth? 

3. Color: Is there a consistently even tone to the drawings or does 
the sheet appear spotty, i.e. is there evidence of uneven pressures: Are 
the items smudged, fuzzy or erratic in their line quality? 

Interpreting the Scribble Exercise 1-A: The directions provided with 
the scoring manual are as follows: 

Open up the folder to page 2 and fold the top sheets back. Lay the 
tests side by side on a long table so that you have an over-all view of the 
work of all the papers. 

Scan up and down the papers quickly keeping in mind the standards 
of order, clarity and color. In this manner those which are “excellent” 
and those which are “‘poor’”’ may be readily identified and it will be noted 
that this generally leaves a group which may be considered “average.” 
If it is desired to obtain a more precise judgment of rank than “‘excellent,”’ 
“average,” and “poor,” by use of the criteria which have already been 
suggested, divide the ‘‘average” group into “good,” “average,” and “fair.” 
This then gives five categories: “excellent,” “good,” “average,” “fair,” 
and “poor.” 

It is considerably more difficult to break up the “average” group 
into three sub-groups than it is to judge the extremes. It has been found 
that a lay person, without any training in art work, can judge the ex- 
tremes with as great accuracy as can competent art teachers, but a lay 
person experiences somewhat more difficulty when efforts are made to 
divide the “‘average”’ group into the three sub-groups. 

Interpreting the Doodle Exercise 1-B: One important student trait 
often identified here is originality. For example, has the student done 
the usual thing and divided the square and rectangle in a symmetrical 
manner, which would indicate conformity or triteness, or has he spotted 
the smaller square off center. Similarly, has he broken up the rectangle 
exactly in the center of the sides or has he done the unusual, i.e., broken 
it up in an asymmetrical manner? 

Interpreting the Imagery Section—2: The criteria which have already 
been mentioned should be kept in mind but in addition the following 
should be noted: 
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. The fertility of imagination as indicated by the number of ideas 
presented. 

. The scope of interests. Are these limited to one particular type 
such as landscapes, people, sea scenes, or does the student have a 
wide range of interests? 

. The clarity of mental image. 

. Color—Does the student utilize an outline only or does he shade 
much of his work? 

. Design—Does the student consistently use abstract forms as 
contrasted with the literal or naturalistic? 


The junior author of this paper (L. F. Smith), who is a member of the 
Educational Research Office and has had no art training, has scored two 
groups of these inventories employing the technique of spreading the 
tests side by side on a long table and identifying those which are ‘‘ex- 
cellent,” “average’”’ and “‘poor’” on a subjective basis. Then, using the 
criteria which have already been suggested the “average” group has been 
divided into “good,” “average,” and “‘fair.”’ 


Table 1 


Reliability of Scoring Horn Art Aptitude Inventory 
[Coefficients of correlation between ratings given by two Art School Faculty members 
(A; and Az) and member of Educational Research Office (E.R.O.)] 








Group I * Group II ** 


Ai Ao E.R.O. Ai E.R.O. 
A — 85 ‘ Ai — .79 
As —_ E.R.O. _ 
E.R.O. 











* Group I consisted of 21 Art School students who took this test in the Fall of 1939. 
** Group II consisted of the Scholarship Class of 20 high school seniors who took 
this test during the Spring of 1944. 


Table : illustrates the reliability of ratings for these two groups. 
In Group I two Art School faculty members and the junior author 
independently scored 21 test papers of regularly enrolled art school 
freshmen. In Group II one of the Art School faculty and the junior 
author independently scored 20 test papers of high school seniors com- 
peting for fellowships in the School of Applied Art. 


Validity 


Two studies of validity have already been made and others are in 
progress. In tie first study all of the students who graduated from the 
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Art School in 1941, ’42 and ’43 were rated by four Art School faculty 
members on their success in the three year course. The Horn test scores 
of these students (NV = 52) were then correlated with the average of the 
faculty ratings. The Pearson product-moment correlation between the 
Horn test scores of these students and the average faculty rating was 
+ .53. 

In the second study the 36 high school seniors enrolled in the Fellow- 
ship Competition Classes of 1943 and 1944 were rated on their success 
in this class by four Art School faculty members. The Horn test had 
been given to all of these students at the beginning of the class and the 
product-moment correlation between their scores and the average faculty 
rating of success was + .66. 

That the Horn Art Inventory measures something other than in- 
telligence is indicated by the product-moment correlation of only + .15 
between the Inventory test scores for the classes of 1941, ’42, and ’43 
(N = 52) and their American Council on Education Psychological 
Examination scores. That this inventory is of more value in predicting 
success in the three-year art course than is the A.C.E. intelligence test 
is indicated by the product-moment correlation of + .28 between the 
intelligence test scores and the average faculty rating of success for these 
same three classes. 





Summary 


1. The Horn Art Aptitude Inventory has been in the process of 
development for a period of more than eight years at the Rochester 
Institute of Technology. 

2. The unique features of this Inventory are: (A) The student is 
required to make reasonably simple drawings which illustrate the quality 
of line he employs, his appreciation of proportion, and his compositional 
sense, and (B) the student is given exercises which provide an indication 
of the scope of his interests, the fertility of his imagination, and the 
ability to depict pictorially ideas which occur to him. 

3. The scoring is still somewhat subjective but correlations between 
the ratings given test papers by Art School faculty members and a lay 
person vary from .79 to .86 for two different groups of students. It 
appears that a lay person with no training in art can score these papers 
as adequately as members of the Art School faculty. 

4. The product-moment correlation between the Horn Inventory 
test papers and faculty rating of success of the 52 graduates of the classes 
of 1941, ’42 and ’43 in a three-year full-time program was + .53. The 
correlation between test scores and success in two much shorter Scholar- 
ship Classes was + .66 (N = 36). 
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5. The correlation between scores on the Horn Inventory and A.C.E. 
intelligence test scores is low (+ .15). This Inventory is of more value 
in predicting success in the three-year Art School course than is the 
A.C.E. intelligence test as the correlation between the latter and course 
success was + .28. 

Additional studies are being carried on in the effort to make the 
scoring of test papers more objective and to determine the effectiveness of 
this as a predictive instrument for different age groups. It is believed 
that the results of these studies will improve this instrument which has 
already been of value in one institution. 


Received September 18, 1944. 








A New Method for the Administration of Individual 
Intelligence Tests 


Raymond Corsini 
Auburn Prison, Auburn, New York 


Most test manuals and books on tests and measurements agree 
rather well on the general methods of conducting an individual intelli- 
gence test (1) (2) (3) (4) (5) (6) (7) (8) (9). Terman and Merrill (7) 
summarize in three directives: “‘(1) Standard procedures must be followed, 
(2) the child’s best efforts must be enlisted by the establishment and 
maintenance of adequate rapport, (3) responses must be correctly 
scored.” 

However, there is a lack of directions in any manual for the actual 
administration of an individual test in terms of three variables: (1) where 
to place subject in relation to examiner, (2) where to keep test materials 
during the course of the examination when not in actual use, and (3) how 
much of the behind the scene actions of the examiner are to be permitted 
to be seen by the subject. 

The purpose of this article is to give a description and evaluation of 
various ways in which these three variables are met by examiners, plus 
the description of a new method for administering individual tests which 
appears to be superior to any in present use. 


Placement 


Generally the subject (1) sits at the right hand side of the examiner’s 
desk, (2) or faces the examiner behind a table. The second method is 
more comfortable for the subject if he has any writing to do, or if he has 
to handle any material. 


Materials 


Some examiners stow all materials in a desk drawer. Some keep 
materials in small boxes, within a larger box which in turn is put on the 
desk. Some scatter materials loosely over the desk or table. Some 
examiners keep all material out of sight except when in use. Others 
permit material to accumulate on the desk or on the table. 

The best method appears to be that which is most convenient for the 
examiner and which permits no wasteful searching around for an article. 
Generally, it seems best to handle any items so that they will not distract 


the subject. 
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Scoring 


There are four popular ways to score the test blank. Two are done 
in the subject’s sight, two are done out of sight. 

Method “‘A” scores openly on the desk or table. 

Method “B” scores openly but makes a check mark to indicate cor- 
rect, and a check mark with a loop to indicate wrong. 

These methods have the good point that they help keep up rapport 
in that nothing is hidden. Method “A,” however, is poor because sub- 
ject tends to change or add to his answer on getting a minus, or may even 
demand why he is being scored incorrectly. 

Method “B” may fool some very dull subjects, but generally subjects 
know when they are wrong, and realize that some hidden procedure is in 
operation. This method tends to cause unrest on the part of the subject. 

Methods “‘C” and “‘D”’ involve scoring the protocol out of sight. In 
method “C,” the best of these four methods, a visual barrier made 
either from a folder or made of more permanent material is interposed 
on desk or table between subject and psychologist. Behind this barrier, 
the psychologist prepares materials, and scores the protocol. The 
advantage is in not letting the subject see what he is being marked, but 
its disadvantage lies in its abruptness and the “‘insult” to the subject. 

Method “D”’ involves folding the test booklet into quarters, keeping 
it flat on the bottom of a desk drawer, together with stop watch and 
manual, and attempting to mark the protocol in an unobstrusive manner 
with a stub of a pencil. This method, to the author, seems the worst of 
the four, since it soon becomes obvious to the subject that the examiner 
is reading from a book in the desk, and is slyly making notes meanwhile. 


The New Method 


For some time the author of this article has followed a novel procedure 
in administering individual tests that appears to possess superior ad- 
vantages to any of the combinations of the three general variables so far 
described. 

Following the interview, the subject is asked or is told to take an 
individual test. A table, 18’ x 30’, is placed parallel to the pull-out 
leaf on the right hand side of the examiner’s desk. The subject sits 
behind the table, facing the examiner. From the subject’s point of view 
a box is then placed on the upper right hand corner of the small table. 

This arrangement allows the subject to sit behind the table, with his 
feet under it. He has plenty of space to write. The examiner makes 
his notations on the pull-out leaf of his desk. 
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As soon as both are settled, the examiner removes the test manual 
and a test protocol (blank for scoring responses) from the box. This 
immediately indicates the function of the box to the subject. The 
manual is placed on the desk in view of the subject but too far away for 
him to be able to read from it. The protocol is placed on the pull-out 
leaf in front of the box, therefore out of sight of the subject. The box 
acts as a visual barrier, but in so natural a manner that it cannot disturb 
a subject since he accepts the box as an integral part of the examination 
procedure. 





DESK 

















Pullout 
Leaf 
Tesfi 

racy 




















Examinee 


SS 
SERS 


Fie. 1. Schemata for individual test administration. 


TABLE 











Whenever any test materials are needed, such as blocks, cards, 
tissue paper, scissors, form boards, etc., they are taken from the box, 
quickly and simply, since they are found in various compartments of the 
three shelves or sections, and immediately after use are returned to their 
proper places. Everything is instantly available, nothing can accu- 
mulate, and since there is no searching for materials the examination 
proceeds swiftly and efficiently. 

The author has had two such boxes constructed, one for the Wechsler- 
Bellevue and one for the Stanford-Binet. Of course, the arrangements 
of the partitions vary to accommodate different materials. The boxes 
are uniform in size, approximately 5’ xX 10” K 12”. Each box consists 
of three elements or shelves, stacked on top of each other, hinged at the 
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back, and locked to each other at the front by a simple catch. The top 
element has a hinged cover.* 

At the conclusion of the test, the manual is replaced, and the box is 
put away, ready for the next administration. 


Summary 


1. There is no uniformity of individual test administration with 
respect to these variables: a. Placement of subject and examiner; b. 
Maintenance of material during the course of the examination; and c. 
Scoring the test blank. 

2. A new method of administering individual tests is described which 
possesses the following advantages: a. Is standard; b. Is fast, efficient, 
and simple; and c. Reduces subject-examiner friction. 


Received September 22, 1944. 
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The Relationship Between Scholastic Achievement and 
Personality Adjustment of Men College Students 


George R. Griffiths 
Division of Personnel Management, U. S. Maritime Commission 


Personality is often considered an Ailadin’s lamp to achievement. 
If one has “personality,” success is inevitable. If one has no “per- 
sonality,”’ he may as well resign himself to his fate. Fortunately, such 
an attitude is fast being displaced by a scientific appreciation of the true 
nature of that phantom “personality.” 

Broadly speaking, personality consists both of a person’s reactions 
and responses and of the influence that person wields over others. Space 
limitations preclude a lengthy discourse on the nature of personality, 
but let us take a moment to view some of its main factors or traits. We 
can do no better than to refer to Allport’s table of personality traits (1). 
Listed concisely, they are: 1. Intelligence; 2. Motility; 3. Temperament; 
4. Self-expression; and 5. Sociality. 

Some authors add a sixth, physique. Gaskill (3), over a ten-year 
period, made a survey in his beginning psychology classes of the traits 
most desired in a mate. Seventy per cent ranked health first; forty-five 
per cent ranked intelligence second. His list of the elements of per- 
sonality is as follows: 1. Intelligence; 2. Social adjustment; 3. General 
characteristics of overt behavior; and 4. Physical characteristics. 

A scientific analysis of personality, its traits, and its relationships, 
should proceed by the examination of specific factors and specific rela- 
tionships. The problem here undertaken is to determine whether or 
not there is a significant relationship between personality adjustment 
and academic achievement. 


Previous Investigations 


Those who blandly state that intelligence does not correlate with 
personality are ignoring the fact that intelligence is an integral part of 
personality. What these persons do mean, however, is that intelligence 
does not correlate highly with various other personality traits. High 
intelligence, according to Strang (9), is ordinarily associated with a 
pleasing personality, since intelligence involves insight, the ability to see 
relationships, and the capacity to learn. 
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A brief summary of various other pertinent studies may serve to reveal 
the nature and findings of previous investigations. A substantial posi- 
tive relationship between intelligence as measured by standardized tests 
and personality as judged by interviews and ratings was found in Uni- 
versity of Iowa studies (2). However, when scores on personality tests 
and questionnaires are used rather than the results of observation, the 
coefficients of correlation range around zero (between + .20 and — .20). 

Terman (10) found that superior children (most of them with IQ’s 
above 140) are more emotionally stable and more socially adequate than 
unselected children. One study (4) showed a tendency toward small 
negative correlations between scores on neurotic inventories and scholar- 
ship. On the other hand Thurstone found, in applying his Personality 
Schedule to college students, that no relationship existed between in- 
telligence and “neurotic tendency,” but that the less well-adjusted 
students tended toward slightly higher academic grades (6). There is a 
tendency for higher learning performance to be associated with sub- 
missiveness as measured by the A-S scale (5). 

Stagner’s findings (7) were that unstable and maladjusted students 
do less well in proportion to their intelligence than do stable persons; 
that introverts earn proportionately higher marks; and that unfavorable 
scores in emotionality and self-sufficiency are associated with lower 
achievement than would have been predicted from intelligence alone. 
Strang (9) reports a lack of relationship between scholarship and various 
measures of introversion-extroversion. Finally, reference should be 
made to a study by Stedman (8). She found that the average grades of 
pupils with certain health defects were only 75% those of healthy pupils. 
However, of her 450 cases, the healthy group numbered only 39. 

Occasionally studies appear to show definite relationships between 
personality and scholarship. Others seem to be contradictory. The 
result is that there has as yet been no clear definition of the connection 
between scholarship and personality. 


Study at Ohio University 


This study was undertaken with Freshman men at Ohio University to 
discover whether there is a relationship between scholastic achievement 
and personality adjustment, employing the statistical technique, the 
probable error of the difference between means. The measure of scho- 
lastic achievement used was the first semester point-hour-ratio; i.e., the 
total number of semester hours of courses carried divided by the number 
of points earned, where A = 3, B = 2, C = 1, and D=0. The Bell 
Adjustment Inventory was used as a measure of personality adjustment. 
This Inventory measures four types of personality adjustment: health, 
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home, social and emotional, yielding scores for each area and a total, 
composite score. (Strang is firm in her assertion of the unreliability of 
such personality questionnaires (9). Her argument is based mainly on 
the fact that they fail definitely to differentiate psychiatric patients. 
She may be right; that is not the problem here.) 

In this study of personality and scholastic achievement several 
different viewpoints were utilized. The first approach was to compare 
with various other groups those men placed on scholastic probation at the 
end of their first semester for earning a point-hour-ratio of less than 0.5. 
The other groups are: first, a group matched person for person with the 
Probation group in college ability scores; i.e., scores on the Ohio State 
University Psychological Examination (Matched group); second, a group 
selected at random (Average group); and third, a group matched indi- 
vidually with college ability scores as high as those of the Probation 
group were low (Excellent group). 


Table 1 


Comparison of Probation Group with Other Groups in College Ability, 
Grades, and Personality 











College Point- Total 

Ability Hour- Personality 
Group N Percentile Ratio Score 
Probation 40 21.3 0.266 39.2 
Matched 40 21.3 1.023 36.4 
Average 40 49.8 1.416 37.8 
Excellent 40 78.7 1.703 34.8 





Statistical comparisons of these groups in college ability, grades, and 
personality (as measured by the Bell Adjustment Inventory) are presented 
in Table 1. All figures are arithmetic means. A low score is the favor- 
able score on the Bell Adjustment Inventory; so the lower the mean, the 
better. Probable errors of the difference between means were computed 
as a measure of the significance of those differences. It is generally 
accepted that to be statistically significant the difference should be at 


Table 2 
Probable Errors of Mean Differences in Personality of Probation and Other Groups 











Difference 
Groups of Means P.E. (Diff.) 
Probation and Average 14 2.367 
Probation and Matched 2.8 2.153 


Probation and Excellent 4.4 1.994 
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least four times its probable error. The difference in mean personality 
scores and the probable errors are shown in Table 2. In no instance is 
the difference significant since the probable errors are too great. In the 
comparison of the Probation men with Excellent men the greatest 
difference is found; but it is only 2.2 times its probable error. Appar- 
ently, men experiencing scholastic difficulty exhibit no significant per- 
sonality differences from persons of superior college ability. 

Another comparison made was of the Bell Adjustment Inventory 
scores of Probation men and men matched as nearly as possible with 
point-hour-ratios as high (Opposite group) as those of the Probation 
group were low. Table 3 includes scores of these two groups in college 


Table 3 


Comparison of Probation and Opposite Groups in College Ability, 
Grades, and Personality 








College Point- Total 
Ability Hour- Personality 
Group N Percentile Ratio Score 





Probation 40 21.3 0.266 39.2 
Opposite 40 85.3 2.610 37.4 





ability, grades, and personality. The difference of 1.8 between the mean 
personality scores is clearly not significant, as the probable error of this 
difference is 2.099. In other words, men on scholastic probation are 
approximately equal in personality adjustment to men with superior 
scholastic records. 


Table 4 


Comparison of Groups Divided on the Basis of Personality Scores 








Total College Point- 
Personality Ability Hour- 
Group N Score Percentile Ratio 





Very unsat. 37 66.9 42.3 1.018 
Unsatis. 122 48.8 39.5 1.004 
Average 77 38.5 48.0 1.198 
Good 112 17.5 49.0 1.169 
Excellent 17 7.1 50.7 1.263 





A third approach was made by comparing groups of men divided on 
the basis of their total scores on the Bell Adustment Inventory. Five, 
groups are thus differentiated: very unsatisfactory, unsatisfactory, 
average, good, and excellent. In Table 4 are presented averages in 
personality, college ability, and grades. The difference in point-hour- 
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ratio between the very unsatisfactory group and the excellent group is 
0.245, in favor of the excellent group. The probable error of this differ- 
ence is 0.1187. Since this figure is less than one-half the difference, the 
difference between the means of these two groups is not statistically 
significant. Apparently, men students scoring very unsatisfactory on the 
Bell Adjustment Inventory do not reveal scholastic trends significantly 
different from those scoring excellent in personality. 

The fourth attack on the problem was a comparison of two groups 
selected on the basis of health scores in the Bell Adjustment Inventory. 
This viewpoint was suggested by the study of Stedman cited above. 
Men scoring very unsatisfactory in health were separated from those 
scoring excellent. Table 5 contains their mean scores in personality, 


Table 5 
Mean Scores of Groups Divided on the Basis of Health 











Total College Point- 

Personality Ability Hour- 

Health N Score Percentile Ratio 
Very unsat. 25 53.1 50.3 1.133 
Excellent 22 17.7 55.6 1.441 





college ability, and grades. As an interesting sidelight, the grades of 
those with very unsatisfactory health were 78.6% of those in excellent 
health. Although based on smaller samples, this appears to be in line 
with Stedman’s finding of 75%. However, the difference between the 
mean point-hour-ratios is 0.308, which is only 2.3 times its probable 
error, 0.133, and, therefore, not significant. We can conclude, then, that 
men scoring very unsatisfactory in health are not particularly inferior in 
scholastic achievement to, although a difference appears to exist in favor 
of, those in excellent health. , 

A fifth comparison made was of the grades of men scoring very un- 
satisfactory and men scoring excellent in emotional adjustment on the 
Bell Adjustment Inventory. Table 6 contains mean personality scores, 


Table 6 
Mean Scores of Groups Divided on the Basis of Emotional Adjustment 











Total College Point- 
Emotional Personality Ability Hour- 
Adjustment - N Score Percentile Ratio 
Very unsat. 40 58.9 42.58 1.144 
Excellent 14.9 . §6.65 1.118 
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college ability percentiles, and point-hour-ratios of these two groups. 
The difference in point-hour-ratios is 0.026 in favor of the very unsatis- 
factory group. This is the only difference in the study which appeared 
contrary to expectation. It corresponds, however, with the tendency 
Thurstone found in studies mentioned above. As its probable error 
amounts to 0.1001, no importance can be placed on the difference. The 
difference is also ‘the smallest revealed in the study as it is but one-fourth 
its probable error. 

A final analysis utilized comparison of the highest and lowest deciles 
in college ability to see whether differences in personality might exist. 
This is more nearly a comparison of personality with intelligence. Mean 
scores for these two groups are contained in Table 7. In this case the 


Table 7 
Mean Scores of Groups Divided on the Basis of College Ability Scores 








Point- Total 
Hour- Personality 
Group N Ratio Score 





Lowest 10th 39 0.533 42.7 
Highest 10th 38 2.092 37.7 





difference in personality scores is 5.0 in favor of those in the highest decile. 
This difference is 2.1 times its probable error and, hence, not great enough 
to be accepted as of statistical significance. The only conclusion that 
can be drawn is that men in the highest and lowest deciles of college 
ability do not show a marked difference in personality. 


Results and Conclusions 


The question of whether there are valid relationships existing between 
scholastic achievement (point-hour-ratio) and personality (Bell Ad- 
justment Inventory) has been examined here from several different points- 
of-view. The results are these, briefly: 


1. Men clearly in scholastic difficulty, having been placed on academic 
probation, are not very much inferior in personality adjustment scores 
to men of superior college ability (Tables 1 and 2). 

2. Men students with brilliant scholastic records are no better 
adjusted in personality than men of lowest academic achievement 
(Table 3). 

3. An analysis of men with very unsatisfactory personality scores 
shows no significant difference in their grades from those with excellent 
personality adjustment scores (Table 4). 
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4. Scrutinizing a comparison of men with very unsatisfactory health 
scores with men of excellent health scores reveals a small but not signifi- 
cant difference in favor of the excellent group (Table 5). 

5. Men with very unsatisfactory emotional adjustment scores tend 
toward higher grades than men of excellent emotional adjustment scores, 
but the difference is not significant (Table 6). 

6. There is no very great difference in personality scores evident 
between men in the lowest decile of college ability (The Ohio State Uni- 
versity Psychological Examination) and men in the highest decile 
(Table 7). 

In every case but one there is such a difference as suggests some 
degree of positive correlation between scholastic achievement and per- 
sonality. But as a difference, to be accepted as statistically significant, 
must be at least four times its probable error, the differences found are 
not large enough to be valid. In these analyses the differences ranged 
from 0.25 to 2.3 times their probable errors. Nevertheless, it seems 
reasonable to conclude that the consistency of these differences, even 
though they are small, is in itself important. It may mean that our 
psychometric techniques, especially personality measures, are in need 
of refinement. Then, too, it may mean that actual differences do not 
exist, however logical it is to expect them. In any event, no conclusions 
can be safely drawn until further research is conducted with more positive 
results. 


Suggestions for Further Study 


Further study along two lines might be productive of useful results: 

1. The difference in mean personality scores between the Probation 
and the Matched groups in favor of the latter (Table 1) hints that per- 
sonality factors may be present to influence the difference in grades of 
persons of equal college aptitude. The causes of diverging academic 
records of persons of approximately equal mental ability should be 
investigated to determine whether personality factors are present. 

2. Since college students are highly selected, nearly all being above 
normal in intelligence, studies should be made where groups definitely 
below average can be compared with those high in intelligence. 


Received September 30, 1944. 
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Negro-White Attitudes Towards the Administration of 
Justice as Affecting Negroes 


F. C. Sumner and Dorothy L. Shaed 
Howard University 


It was proposed in this study to measure the degree of unanimity in 
attitudes of Negroes and whites both at the college level and at the adult 
level with respect to the administration of justice as affecting Negroes. 


Method 


A questionnaire was devised consisting of 56 statements taken from 
the spontaneous conversations of Negroes. The respondents were 
instructed to read each statement and to indicate their reaction with a 
circle in one of the following six ways: 


If you feel that the statement is absolutely true, draw a circle around 
the symbol T,. 

If you feel that the statement is more true than false, draw a circle 
around the symbol T;F;. 

If you feel that the statement is about equally true and false, draw a 
circle around the symbol T,F». 

If you feel that the statement is more false than true, draw a circle 
around the symbol T,Fs. 


If you feel that the statement is absolutely false, draw a circle around 
F,. 


In case you do not understand a statement, draw a circle around the 
question mark. 

Personal information was requested such as sex, age, race and whether 
or not one had had any court experience (By court experience was meant 
any experience from being merely a “spectator” to being a judge). It 
actually turned out that the attitudes of those with court experience 
differed very slightly from the attitudes of those without court experience. 

Of the 1,099 persons replying to the questionnaire there were 246 
white college students of whom 176 were male and 70 female; 660 Negro 
college students of whom 261 were male and 399 female; 193 adults of 
whom 42 were white and 151 Negro. 
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The 906 college students replied from the following colleges: 


University of Illinois 66 (M 44, F 22) 
University of North Carolina 90 (M81,F 9) 
University of South Carolina 54 (M 381, F 23) 
University of Florida 36 (M 20, F 16) 
West Virginia State College 89* (M 35, F 54) 
Howard University 48* (M 26, F 22) 
University of Illinois 13* (M_ 0, F 13) 
Tennessee State College 97* (M 26, F 71) 
Florida A. and M. College 94* (M 49, F 45) 
Alcorn A. and M. College (Miss.) 77* (M 53, F 24) 
State College, Orangeburg, 8. C. 105* (M 27, F 78) 
Virginia State College 137* (M 45, F 92) 


Negro college students starred. 


The 193 adults who replied lived in the District of Columbia. 

In reducing the great mass of raw data to manageable terms the 
following formula was devised and designated the True-False Index 
(TF Index) of a group in respect to a particular statement: 


(Ta + TP + WTF 2) — ATF 2+ Tis t+ Fs) 
N (= total number of replies to the specific statement) 





For example, the 174 white college males replying to Statement No. 
27 (Judges are entirely free of racial prejudice) distributed as follows: 


T. T3F T2F 2 TiFs F, 
9 35 24 55 51 


and the TF Index is 


(0 +35 + 2) (2 + 55451) 


nal — RRO — — 2907 
174 174 or 32% 68 /0 36 70° 





This obtained TF Index means that 36 per cent voted against the pro- 
position over and above the remaining 64 per cent who were tied between 
accepting and rejecting it. 

TF Indices vary between + 100 (unanimous belief of the group in the 
truth of a statement) and — 100 (unanimous disbelief of the group in the 
truth of a statement). When TF Indices are + 100 to + 34 inclusive, 
they indicate a definitely positive reaction on the part of the group in as 
much as two-thirds or more of the group accept the proposition; when 
TF Indices are + 33 to — 33 inclusive, they indicate a definitely mixed 
reaction on the part of the group in as much as two-thirds or more of the 
group are tied between accepting and rejecting the proposition; when 
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TF Indices are — 34 to — 100 inclusive, they indicate a definitely 
negative reaction on the part of the group in as much as two-thirds or 
more of the group reject the proposition. 


Results 


Percentages of the 56 statements to which positive, mixed and nega- 
tive reactions are made by each of the several groups are given in Table 1. 


Table 1 


Percentages of the 56 Statements to which Positive, Mixed and Negative Reactions 
are Made by Each of the Several Groups 











Group Positive Mixed Negative 
All White Adults 7% 71% 21% 
All White College Students 34 41 25 
White College Males 32 43 25 
White College Females 30 39 30 
All Negro Adults 45 27 29 
All Negro College Students 46 34 20 
Negro College Males 41 34 25 
Negro College Females 48 32 20 





From Table 1 it appears that in the white adult group the percentage 
of mixed reactions is higher than either that of positive or that of negative 
reactions and even higher than the combined percentages of positive and 
negative reactions. In the white college groups the percentage of mixed 
reactions is higher than either that of positive or that of negative reactions 
but not higher than the combined percentages of positive and negative 
reactions. On the other hand, it appears that in all Negro groups per- 
centages of positive reactions are higher than either that of mixed or that 
of negative reactions while combined percentages of positive and negative 
reactions are in every case higher than the percentages of mixed reactions. 

The very strong tendency of the white adult group towards mixed 
reactions (two-thirds or more of the group being tied between accepting 
and rejecting the statements) may be thought due at least in part to the 
fact that the issuing of the questionnaires to this group was done in 
person by a Negro which may have in a selective or moderating way 
influenced the reactions. On the other hand, the mixed reactions of this 
adult white group appear to be but a fuller manifestation of a tendency 
to reservation, i.e., to mixed reaction already perceptible in every white 
group of college students despite white administration of the question- 
naires. Factors more likely influencing white groups to mixed reactions 
may be gleaned to some extent from scattered comments written in the 
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Table 2 
TF Indices of the Several White and Negro Groups for Each of the 56 Statements 








Male Female 
College College College 
Adults Students Students Students 





White 42 246 176 70 
Negro 151 660 261 399 
Total 193 906 437 469 


1. The practice of Negro lawyers should be confined to routine office work. 
White —55 —51 —53 —47 
Negro —97 — 64 — 69 —70 








2. In the eyes of the court one white witness is better than any number of Negro 
witnesses. 
Ww —12 -—17 —21 — 6 
N i) 31 16 41 





3. Negro lawyers are as well prepared for the practice of law as are white lawyers. 
Ww 45 —10 —18 7 
N 55 48 35 51 





4. Where the litigation is between a Negro and a white, the white man or woman 
is favored to win in court without regard to the merits of the case. 
Ww —24 37 38 34 
N 34 39 26 48 





5. Negro lawyers do not prepare their cases as well as white lawyers. 
W —16 —4l —38 —48 
N —53 —55 —48 —59 





6. Negro jurors are more easily swayed than white jurors. 
Ww 15 25 31 9 
N —29 —17 —20 —15 





7. Negroes give too much irrelevant material in their answers to questions in court. 
Ww 24 30 31 23 
N 1 24 23 24 





8. Negroes should be represented on the staff of penal institutions in which the 
prison population contains Negroes. 

WwW 21 49 46 56 

N 89 77 81 75 





9. More severe sentences are meted out to Negroes than to whites for the same 
offense. 
Ww - 8 46 49 34 
N 64 54 51 22 


10. A Negro represented by a white lawyer receives a lighter sentence than a Negro 
represented by a Negro lawyer. 
WwW —43 24 28 13 
N 26 26 31 24 
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Table 2—Continued 





Male Female 
College College College 
Adults Students Students Students 





11. A light complexioned Negro receives a more severe judgment by a white jury 
than a dark complexioned Negro. 
WwW —58 —47 —38 —68 
N —44 —54 —53 —54 





12. A Negro on a jury in the South is afraid to vote contrary to consensus of opinion 
of the white jurors. 
WwW —21 55 55 53 
N 52 35 28 39 


13. A Negro litigant who is employed in a menial capacity by influential whites is 
favored to win in court over a Negro not so employed. 
WwW —25 61 54 61 
N 59 67 73 63 


14. Judges have their minds made up before hearing a case when the litigation is 
between a Negro and a white. 
WwW —76 — 64 — 54 —72 
N —21 - 3 — 6 -1 











15. Negro lawyers feel that Negro jurors are prejudiced in favor of the white side 
of the case. 
WwW —41 —65 —62 —73 
N —24 —31 —34 —29 





16. A Negro policeman arresting a white man cannot bring sufficient evidence against 
him to secure his conviction. 
Ww —60 —58 —62 —41 
N — 40 — 56 —62 —51 





17. Many cases of Negro conviction are found to be miscarriages of justice years 
afterwards. 
WwW -11 17 26 — 6 
N 48 60 58 61 


18. Of several persons found flagrantly breaking the law, it is usually the Negro in 
the group who is arrested. 
WwW 0 40 33 53 
N 66 54 45 61 


19. Other things being equal, a white lawyer is favored to win in court over a Negro 
lawyer. 











WwW 7 71 75 60 
N 54 43 36 48 
20. A white woman’s word in accusing a Negro is “proof positive” in court. 
WwW 10 20 23 13 
N 46 . 55 51 57 





erential 
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Table 2—Continued 





Male Female 
College College College 
Adults Students Students Students 


21. Without provocation Negroes are beaten and otherwise maltreated by white 
policemen. 
W 5 —26 - 9 — 54 
N 66 53 52 54 


22. White policemen who, without provocation, beat up Negroes are never convicted 
of their offense. 
WwW - 3 —30 —19 —59 
N 66 30 24 34 


23. Negro lawyers will keep you in court the rest of your life. 
Ww —25 —82 —82 — 83 
N —76 —76 —78 —71 


24. White juries are more prejudiced against the Negro on trial than white judges. 
WwW —21 47 50 39 
N 28 24 20 26 


25. White lawyers make light of Negro lawyers in court. 
WwW —29 26 31 14 
N -1 9 20 2 


26. White policemen do not cooperate with Negro policemen on the force. 
WwW —32 —32 —31 —32 
N —5l —27 —34 —20 


27. Judges are entirely free of racial prejudice. 
W — 26 — 34 —36 —31 
N —72 —65l1 —61 —47 


28. Negro lawyers lack the integrity of white lawyers. 
WwW —40 —29 —24 —42 
N — 93 — 59 —59 —59 
29. In the attitude of the court the Negro has no rights which the white man is bound 
to respect. 
W —12 —73 —73 -71 
N —38 —20 —60 —22 


30. Even federal courts countenance refusal of certain privileges of the court build- 
ing to Negro lawyers. 
WwW 
































14 —12 —17 0 
N 58 21 14 27 


31. More white offenders are let off on insanity pleas than Negroes when both are 
accused of the same offense. 
Ww 38 60 72 48 
N 60 72 72 71 


32. White lawyers have more outside influence with the court than do Negro lawyers. 
Ww 49 89 91 83 
N 65 82 88 78 
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Table 2—Continued 





Co College College 
Adults Students Students Students 





33. In the South a white person must be quite a social outcast to be convicted of a 
crime against a Negro. 
Ww 26 38 40 35 
N 75 66 56 73 


34. The court believes a white lawyer is entitled to a larger fee than a Negro lawyer 
for the same piece of work. 














W 18 8 8 7 
N 40 63 53 69 
35. If a Negro is numbered among a group of suspects, he is usually the first to be 
grilled. 
WwW 41 60 62 54 
N 73 71 68 73 
36. Negro lawyers have few opportunities to practice other than routine office work. 
Ww 10 43 39 53 
N 5 43 31 51 
37. Negroes do not deserve the privilege of a court trial like other people. 
WwW — 67 — 98 — 9% —100 
N —100 — 100 — 100 — 100 





38. Negro lawyers do not take extra courses in law after receiving their law degree 
because they have little opportunity to use their knowledge. 
Ww 3 —28 —33 -17 
N —16 —22 —19 —24 





39. Negro offenders under legal age are more often put into institutions with hard- 
ened criminals than white offenders of the same age. 
Ww 0 36 42 23 
N 66 57 58 57 





40. Many Negroes never bring suit against white persons, regardless of the amount 
of proof, because they feel that they are bound to lose in court. 
WwW 10 61 61 61 
N 62 64 62 66 





41. In cases of erroneous conviction of Negroes nothing is ever done to the person 
or persons who originally brought the false accusation. 
Ww 0 10 12 2 
N 69 45 37 50 





42. In criminal institutions Negro offenders are segregated into poorer quarters, 
given the most laborious tasks, and are in other ways maltreated. 
Ww —10 14 15 14 
N 74 50 45 54 
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Table 2—Continued 





Male Female 
College College College 
Adults Students Students Students 





43. In the South the sentencing of a Negro vagrant to a chain gang is usually a life- 
sentence. 
Ww - 3 —65 —68 —57 
N 12 13 —10 28 


44. Negroes receive as much justice in courts as do whites of similar social status. 
Ww —15 —11 —10 -—13 
N —39 —37 — 36 —38 


45. A Negro lawyer in the North receives fairer recognition in court than a Negro 
lawyer in the South. 
Ww — 5 77 78 75 
N 78 85 88 83 


46. Negroes would not know that they were treated unjustly if it were not for agi- 
tative organizations. 
Ww —22 —33 —36 —27 
N —47 —49 —49 —43 














47. White policemen accept bribes from Negroes only to have them arrested for 
bribery. 
WwW —39 — 67 — 66 —70 
N —42 1 —19 15 





48. White policemen accept bribes from whites and do not have them arrested. 
Ww —41 27 35 3 
N 25 40 30 46 


49. Negro lawyers are shy about going up against white lawyers. 
Ww —24 24 23 28 
N —54 —12 —26 — 2 
50. Negroes in reality convict themselves in court due to ignorance of legal pro- 
cedure. 
WwW 29 7 9 2 
N 14 33 36 31 


51. In democratic countries where everything depends on majority-vote courts are 
necessarily unfair to members of minority groups. 
Ww — 69 —23 —28 —10 
N —40 27 29 25 


52. All cases of whites vs. Negroes or vice versa should be tried in Federal courts. 
WwW - 8 —33 —31 —39 
N —13 38 42 36 
53. A shabbily dressed, “hat-in-hand’’ Negro lawyer has more influence with the 
court taan a well groomed, intelligent Negro lawyer. 
Ww — 64 —47 —42 —59 
N —46 —48 —40 —53 
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Table 2—Continued 











Male Female 
College College College 
Adults Students Students Students 
54. The victories Negroes win in court are left-handed victories. 
Ww — 24 —34 —32 —44 
N —22 —16 —16 —15 
55. Negroes are not as financially able as whites to purchase justice in courts. 
Ww 33 65 68 56 
N 73 76 77 75 





56. It will take many more years before the Negro can have his constitutional rights 
granted him by courts. 
WwW 28 41 43 38 
N 56 65 63 66 





margins of questionnaires by white students as for examples: A white 
female student of the University of South Carolina with no court experi- 
ence writes: “Some of these injustices against Negroes may be and 
probably are true, but they shouldn’t be.’”’ A white male student from 
the same university writes: “Impossible to have an intelligent opinion 
on some of these questions. Negro lawyers aren’t exactly numerous in 
the South.”” A white male student at the University of North Carolina 
and with court experience writes: ‘Sectional factors and differences 
between Northern and Southern courts are of such a nature that I cannot 
answer adequately. Northern and Southern courts are as different as 
night and day.”” A male white student of the University of Illinois says: 
“Tt is difficult to answer some of the questions unbiasedly because when 
they refer to the South they may be true, and false when they refer to 
the North.” A male white student from the same university: ‘“Negroes 
deserve as much justice as whites, but it is undoubtedly true that they 
receive less. It will take many years, if ever, for these social prejudices 
to be lived down.”’ A white female student at the University of Florida 
writes: “I have no knowledge of the existence of a ‘real’ Negro lawyer.” 

A realization of the complexity of the situation, an appreciation of 
more than one side to the matter, a conflict between ideals and practice, 
lack of familiarity with the problem, little or no emotional involvement 
in the matter appear to be some of the factors making for mixed reactions 
so conspicuous in the whites. 

In Table 2 are presented the 56 statements and the TF Indices of the 
various Negro and white groups for each statement. From Table 2 it is 
possible to make a direct comparison of any two groups in their reaction 
to a specific statement. 
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In Table 3 are presented the Pearson correlation coefficients of the 
56 TF Indices for the various pairs of groups. 


Table 3 


Pearson Correlation Coefficients of the 56 TF Indices for the Various Pairs of 
Groups Arranged in Descending Order 








Negro College Males and Negro College Females .95 
White College Males and White College Females 94 
All Negro College Students and All Negro Adults .93 
Negro College Males and White College Males 87 
Negro College Males and White College Females .79 
All Negro College Students and All White College Students 78 
Negro College Females and White College Males 77 
Negro College Females and White College Females .76 
All White College Students and All Negro Adults .76 
All Negro College Students and All White Adults 72 
All White College Students and All White Adults 62 
All Negro Adults and All White Adults .60 





Conclusions 


The most significant conclusions which may be drawn from the results 
of this study appear to be as follows: 

1. While Negro and white attitudes towards the 56 statements relative 
to the administration of justice as affecting Negroes correlate to a signifi- 
cant degree, this resemblance is much higher between Negro and white 
college students (.78) than it is between Negro and white adults (.60). 

2. The correlation between attitudes is conspicuously higher between 
Negro college students ana the Negro adults (.93) than between white 
college students and the white adults (.62). 

3. While the resemblance in attitudes between the white college 
students and the Negro college students is high (.78), conspicuously 
higher is the resemblance between white college males and white college 
females (.94) and between Negro college males and Negro college females 
(.95). 

4. The attitudes of the Negro college females and of the white college 
females (.76) correlate less than do the attitudes of any other two inter- 
racial groups among college students. 


Received August 17, 1944. 








Values Students Reported from the Study of Emotions 


Key L. Barkley 
Woman’s College of The University of North Carolina 


Both the friends and enemies of psychology have criticised it as an 
undergraduate course in the university on the grounds that it contributes 
very little of real value to the student. The opinion has been offered 
that perhaps the worst defect in the elementary course is that it is evasive 
in application, and that this weakness prevents the student from securing 
material which actually functions in his life. That is, the student may 
get a wealth of psychological fact, but receive little of psychological value. 


Purpose 


The purpose of the project reported here was not to discover simply 
what or how much the students knew about the facts, laws, and principles 
which characterize and govern human behavior, but to find out what 
values in any way or of any kind they believed they had secured through 
the study of a topic in psychology. The word “value” was interpreted 
to the students to mean any benefit, help, or gain they had received from 
the study, or any detriment, hurt, or loss they had suffered. In order 
to make the findings as definite as possible, to give the students a restricted 
topic on which to formulate judgments, and to increase the probable 
accuracy and dependability of the findings, the study was limited to the 
topic of “emotions.” 


Subjects 


Two hundred twenty-six students of elementary psychology, who 
recently had completed their study of the topic of emotions, contributed 
their statements of values received. All of these subjects had the same 
readings assigned from two basic texts, and all of them had about the 
same number of lecture-discussion periods, namely six. 


Procedure 


Since there was no instrument available for the students to use in 
stating their judgments of values received from the study of emotions, 
the investigator had to devise one. To that end, forty-eight students 
were asked to write essays of about 200 words on the topic “The Values 
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I Have Received from the Study of Emotions.’”’ The experimenter read 
the essays and made a list of all the values the students said they had 
received. From the collection of statements, which were kept in the 
students’ own words as far as possible, a check-list was made with the 
various stated values classified under nine arbitrary headings. The 
changes in statements when made were simply to shorten or to generalize 
the item. ' 

The completed check-list was given to the students with the directions 
to check all the items which were statements of values they had received 
from the study of emotions, and to write in statements of values received 
which were not given in the list. It was made plain to all the students 
that their work with the check-list would have no relationship to their 
standing in the course in psychology which they were studying at the 
time. 


Results 


The number of people who checked the various items on the check- 
list was tabulated, and the per cent of the total of 226 subjects who 
checked each item was calculated. These percentages are shown in 
the first column of Table 1, which also presents the items in the check-list 
used in the experiment. 

Certain findings are reasonably clear from an analysis of the data. 
The most obvious one is that over 80% of the students say they received 
definite and known benefits from the study of emotions. Only 17.3% 
of the students said that the study of emotions had had very little effect 
on them as persons (see item number 73). Moreover, some of those who 
made this statement went on to explain that they had received some 
advantage from the study, because it would help them in their professions 
such as social work. 

A second clear finding is that a large number of people said they had 
received the values given in the check-list. The average number of 
persons out of the 226 who checked each item indicating a favorable 
value received was 128, or 56.6% of the group. It is obvious also that 
each person must have checked a considerable number of items. The 
average number of items checked by each person out of the total of sixty- 
four possible ones was 35.8. This means that most of the students believed 
they had received a large number of definite values from the study of 
emotions. It is probable, however, that no student received a separate 
and distinct value for each item checked, since there is considerable over- 
lapping between some items. 

The check-list was fairly comprehensive, and adequate to give the 
students an opportunity to indicate all the values they believed they had 
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Table 1 


Showing check-list on which to indicate values received from the study of emotions, 
and the per cent of the 226 subjects who checked each item. 

Directions: Put a check in the blank space before each statement of a value you have 
received from the study of emotions. If you have received a value for which there is 
no statement, write out a statement of the value in the place provided by a blank 
number under each heading. 


I. General Values 
Per Cent 
Wh 


oO 
Checked 
Item 
74.4 1. It has given me a better conception of emotions; learned what emotions 
are and what they involve; gained insight into my own emotional life. 
78.8 2. Learned when emotions occur, that emotions are the natural result of 
strong stimulation. 
62.8 3. Learned to look at my own emotional problems objectively. 


52.2 4. Learned that emotions are governed through the autonomic nervous 
system. 

74.3 5. Study of emotions has been interesting and informative. 

45.6 6. Learned about a factor that has greatly influenced student life. 

71.7 7. Discovered that psychology has some practical values right now. 

29.2 8. Improved my own general happiness through improved adjustment. 

42.0 9. Established a firm background on which to base feelings and attitudes. 

— 10." 


II. Development of Emotions 


86.7 11. Learned that the aim in emotional development should be to direct and 
control rather than to inhibit emotions. 

31.4 12. Learned that there is a native “core” of emotional behavior. 

82.3 13. Learned that emotions are behavior, hence subject to change and develop- 
ment, especially through learning. 

77.4 14. Learned that other people have a great influence upon our emotional 
development. 

40.3 15. Discovered some good ways/means of emotional development. 

83.6 16. Learned that emotions can be recognized, understood, and then corrected, 
or directed and controlled. 

53.1 17. Learned what should be emphasized in the emotional education of children. 

51.3 18. Learned that emotional re-education is possible; learned about emotional 
re-education, including the methods for removing undesirable emotional 
traits. 

46.0 19. It has taught me which traits to avoid and which ones to attempt to 
attain. 

35.4 20. Learned how emotions are differentiated. 

54.9 21. Learned how emotions are established and developed. 

45.1 22. Set up new goals; am now striving to emulate the standard of emotional 
stability set up in the course; I will try to meet the next set-back with firm 
resolution and not give way to emotion. 
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III. Role of Emotions in General Adjustment 


24. a. Emotions as Motives. Learned that emotions serve as motives. 
25. 6. Emotions as Motives. This study motivated me to better efforts 
toward more wholesome emotional living. 

26. Learned that emotions serve as facilitators and inhibitors of behavior in 
general. 

27. Learned that a successful life must be founded on a firm emotional basis; 
a successful life is greatly dependent upon emotional stability and emo- 
tional maturity. 


28. Helped me to see the important role they play in our lives. 
29.* 


IV. Values Connected with the Experimental Study of Emotions 


30. Learned that emotions cannot be judged from facial expressions. 
31. Learned that the study of emotions is incomplete. 
. Awakened a new interest in the further study of emotions. 
. Learned how to measure emotions. 
. Discovered the difficulty of measuring emotions experimentally. 
. Made me more aware of emotions, viz., just what occurs in emotional 
action. 
. Learned how closely related are fear, joy, and rage in the effect on the 
body. 


V. Influence of the Study of Emotions upon Relationships with Others 


38. Made student more tolerant of other people’s conditions. I am now not 
as apt to criticize friends of mine. 

39. Improved my adjustment to other people. 

40. Developed better insight into the emotional life of other people; I am 
better able to help them. 

41. Enabled me to aid another person to overcome an emotional depression. 


42. I am able to look at others’ reactions more objectively. 
43.* 


VI. Values Connected with Emotional Maturity 


44. Learned what constitutes emotional maturity, hence can work toward it 
now. I know what is expected of me as a college student. I realize the 
attributes possessed by the emotionally mature person. 

45. It helped me to tie together all the loose ends in order to make more con- 
crete my philosophy of life. 

46. Aided student to recognize her limitations and powers. I have discov- 
ered where I fall short in being emotionally mature. 

. Discovered cause of poor adjustment to be emotional immaturity and 
instability, and learned what to do about them. 
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Per Cent 
Wh 

Checked 
Item 


72.1 
18.6 
79.6 
53.5 


56.2 
40.7 
73.0 
32.7 


31.4 
81.9 


17.3 
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48. Learned some of the causes of emotional immaturity. 

49. Helped me to emancipate myself from my home. 

50. I have seen the importance of being emotionally mature and stable. 

51. Learned some of the necessary ways of solving children’s problems. 
Learned how to help them become emotionally mature. 

52.* 


VII. Values Connected with Emotional Stability 


53. Learned what constitutes emotional stability, and how to work toward it. 
It is very helpful to know what makes a person emotionally stable and 
how to acquire these things. 

54. Learned means of emotional control/restraint. 

55. Learned that if a person will do something about a troublesome situation 
he will worry less. 

56. I have gained fuller mastery of myself; attained greater emotional sta- 
bility. 

57. Learned how to overcome unfortunate moods. 

58. Learned how important it is to your health and well-being to be emotion- 
ally stable. 

59.* . 


VIII. Special Individual Benefits 


60. Helped me through an emotional crisis. 

61. Aided in improving efficiency in study and work. 

62. Learned that my emotional problems are not unique, but probably many 
others have them. 

63. Learned that intense emotion may be harmful. 

64. Lost queer notions, fear and apprehension regarding emotions. 

65. Helped me to quit emotionally colored thinking. 

66. Learned that most fears are unnecessary and handicapping. 

67. Evolved the practice of being honest with myself as well as with others. 

68. Study of emotions made me think and wonder about myself. 

69. The study of emotions has made me more hopeful. 

70. Learned that the facial expressions of others are not a safe guide to conduct. 

71. Learned some steps to take in solving emotional problems and will be 
better prepared to meet emergency situations. 

72.°* 


IX. Negative or Questionable Values 


73. The study of emotions has had very little effect on me as a person. 
74.* 


* Items numbered 10, 23, 29, 37, 43, 52, 59, 72, and 74 provided an opportunity to 
write in statements of values received not given in the questionnaire. 
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received from the study of emotions. Only four new items were listed 
by write-ins in the spaces provided for such additions. The remainder 
of the total of thirteen write-ins were criticisms of the course in psy- 
chology, or restatements of items already in the check-list. 

All students did not secure significant values from the study of emo- 
tions. Seventeen per cent indicated that they had been helped very little 
by the study. 


Some Conclusions and Criticisms 


1. The method used in this study is in many ways subjective in nature, 
hence liable to errors which could not be checked on either as to their 
presence or degree. For example, many of the stated values which the 
students say they have received may be simply verbalizations not ac- 
companied by real changes in the personalities and adjustments of the 
students. This weakness is indicated by the fact that some subjects 
checked a number of items as values received, but checked also the 
statement to the effect that the study of emotions had had very little 
effect on them as persons. But the fact still remains that more than 80% 
of the group indicated without reservation that they had received many 
positive values from the study of emotions. 

2. The findings presented here represent the reactions of students 
in one university only where the first course in psychology is taught with 
a certain emphasis. In other universities where the first course is 
taught with a different emphasis, the responses of students might differ 
greatly from these. 

3. This survey is a preliminary one, as it obviously must be, since the 
number of subjects is small. The emphasis upon the general benefits 
received from a course of study, rather than simply upon the knowledge 
of facts secured from it, is healthy, and the technique employed here 
appears to be a useful one. In this connection it should be pointed out 
that the values listed by the students were about equally divided between 
those which were strictly a matter of acquired knowledge and those which 
were of the nature of some new skill, changed viewpoint, better method 
of adjustment, etc. But the average per cent of the group who checked 
the knowledge values was 65, whereas the average per cent checking the 
other values was 48%. 

4. Even though 80% of the students said they received many values 
from the study of emotions in elementary psychology, something should 
be done to reduce the percentage of students who say they get little or 
no benefit from it. In this University over three-fourths of the students 
who take the first course in psychology in normal times do not go on to 
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higher courses. The first course is often psychology’s last chance to 
bring benefits into the lives of students through classroom instruction. 


Received September 18, 1944. 
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Aircraft Recognition: I. The Relative Efficiency of 
Teaching Procedures * 


Lester Luborsky 
Duke University 


Thus far the teaching of aircraft recognition has not been subjected 
to any reported experimentally-controlled analysis of learning product 
or comparison of teaching techniques.' The present Navy recognition 
program is based largely upon the groundwork of general principles arising 
from the research of Renshaw of Ohio State and his students, Schwarzbek 
(8), Knight (3), and others (1, 2, 5, 10). Their experiments led to such 
current emphases in recognition training as high speed presentation and 
perceptual learning of wholes rather than verbal learning of parts. 
Verbal learning of parts is emphasized in the WEFT system in which the 
characteristics of 4 main parts, the wings, engine, fuselage, and tail 
are memorized. 

The standard instructional procedure—which has proved highly 
successful—used in most Navy recognition schools is somewhat as follows: 
When new planes are first introduced the shutter is set on “time” ex- 
posure—i.e., the shutter remains open and the airplane on the projection 
screen is visible until the plunger is pressed again—while good recognition 
features are pointed out by the instructor. Then, in the regular practice 
sessions which follow, the views are exposed for a short period, from 1” 
to 1/50”, depending upon the particular school, and identification is made 
by the student. After each such identification, except during tests, the 
same view is reexposed for 3’’-10”’ and reidentified and discussed. It is 
thought best to use a farge number of different views of each plane. 
The number of different views varies from 5 to 6, to a new view each 
time the plane is shown. The most usual rate of introducing new planes 
is 2 planes per session.? 

* Part of a thesis submitted in partial fulfillment of the requirements for the degree 
of Doctor of Philosophy in the Graduate School of Arts and Sciences of Duke Univer- 
sity, 1945. Grateful acknowledgment is made of the assistance of two members of the 
Department of Psychology, Duke University: Dr. Karl Zener for sponsorship of the 
project, and to Dr. Sigmund Koch for valuable advice. 

1 Since completion of this experiment a summary of the research by Staff, Psycho- 
logical Test Film Unit (9) has reported work on aspects of training procedures related to 
those dealt with in the present report. 

* For valuable discussion of current training techniques and for access to classes the 


author is indebted to Lt. Comdr. H. L. Hamilton, Lt. W. C. Schwarzbek, and Lt. 
Comdr. R. H. Bruce of the U. 8. Navy Pre-Flight School, Chapel Hill, North Carolina. 
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Renshaw and later workers developed the present program em- 
pirically on the basis of their past research findings. As a consequence, 
many differences in practice resulted, such as the variation in use of 
training exposures from 1/50” at some schools to 1” at others. Vari- 
ability also occurs in the number of views presented and in the rate of 
teaching new planes. 

It thus seemed desirable to test the relative effectiveness of several 
present differences in practice, and, in general, to subject a number of 
major variables of the training technique to experimental analysis. In 
brief, the object of the experiment was to determine, as exhaustively as 
possible, the effect of 4 basic variations of aircraft identification instruc- 
tional procedure on the total learning behavior of equated classes of 
subjects. The design of the experiment was essentially: 

1. The administration of a battery of tests calculated (a) to measure 
possible perceptual and learning factors associated with “aircraft identifi- 
cation ability’”’ and (b) to aid in equating the groups for possible factors 
affecting trainability. 

2. The actual training of 4 equated groups under different instruc- 
tional conditions. 

3. The administration of an extensive battery of post-training tests 
in order to facilitate more complete analysis of the learning product for 
the 4 groups. 

The resulting data offer the opportunity for analysis of many of the 
determinants of aircraft recognition ability. The present report will be 
restricted to the major question of the relative efficiency of performance 
as a function of 4 training techniques. A future article will take up the 
role of other factors, including the span of apprehension, various measures 
of visual memory, and a fuller treatment of reaction time. 


Procedure 


Four equated groups, 8 Navy students in each (6 in Group I), were 
taught aircraft recognition in a standardized manner for 45 minutes on 
Monday, Wednesday, and Friday for a 6 week period in May-June 1944. 
For 3 of these groups one different experimental condition was varied. 

The experimental groups were designed to compare the effects of two 
kinds of short exposure,* 1/50’’ vs. 1’, both used in conjunction with 
approximately the same number of long exposures of the same stimuli 
(Group I vs. Group ITI) ;‘ the presentation of a large vs. a limited number 

* For the purposes of simplicity, exposure time will be called 1/50” and 1”, although 
the calibration values give slightly different results (see Apparatus, p. 389). 

* The total number of 1/50” exposures given to Group I was approximately 195 up 


to and including Test 9. An approximately equal number of exposures between 1/10” 
and 1/50’ were given in tests before the 1/50” speed was attained. 
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of views (Group I vs. Group II); and the slow introduction of new planes 
vs. a rapid introduction, followed by review of confused planes (Group 
I vs. Group IV). In other respects an attempt was made to make the 
procedures comparable to those most frequently used in Navy recognition 
schools. For all groups in which tests were given at 1/50’ during train- 
ing, the speed was attained by progressively reducing the exposure time 
during tests from one session to the next—starting at 1’’ at the beginning 
of the second week and reaching 1/50” by the last session of the third 
week. At each session three new planes were taught until, by the middle 
of the fourth week, 33 planes had been presented. By the end of the 
training period approximately 8 different views of each plane were used. 

In Group II the exposure time and other variables were similar to 
Group I except that the number of views of each plane was limited to 
3 diagram views. Group III was given 1” exposures and Group I, 
1/50” exposures during tests. As will be indicated below in the descrip- 
tion of training procedure, exposures for review and teaching were longer. 

In Group IV an answer was sought to the question of the effect of 
teaching new stimulus material at almost twice the usual rate and then, 
after completing the syllabus, giving special instruction on the most 
frequently confused stimuli. As part of this special instruction, the most 
frequently confused planes were shown simultaneously on the screen and 
differentiation features stressed. When a large group of difficult homo- 
geneous views were reviewed in this manner, as e.g., head-on-views of 
single engine fighter planes, the students sketched these while learning. 

Training Procedure. ‘Training, as described in the paragraph below, 
is composed of all the procedures which were directed toward the sub- 
jects’ learning of the present aircraft recognition task. Training includes 
the frequent aircraft recognition tests and the review which followed 
these tests during which each plane was reexposed Training also in- 
cludes the teaching of new planes and the review following this teaching, 
and the home study done by each subject. The paragraph below de- 
scribes the training aspects of a typical session. In addition to this, 
training continues outside of class in home study. All groups were 
given the same general training procedures, with the exception of the 
experimental training variables mentioned above (exposure time on 
tests, number of views of each plane, and the rate of introducing new 
planes). 

During the first 3 minutes required for dark adaptation, announce- 
ments were made of scores in the last test, etc. Then a test was given of 
planes taught up to that session in which subjects recorded the name of 
each plane on a specially lined mimeographed sheet. This required 10 
to 18 minutes, depending upon the number of planes, which increased in 
successive tests as training was continued. The rate of presentation 
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was 4 per minute. The experimenter, by watching the face of a large 
sweep-second hand clock, maintained a rather constant presentation 
tempo. All planes in this test were then reviewed for instructional 
purposes on time exposure. The time exposures were approximately 
3” to 6”. The subjects called out the name of each plane and were 
corrected when necessary. Subjects’ questions were answered at this 
time to avoid making the review too mechanical. In the last 4 training 
sessions and in the “‘training-II” ® sessions, the exposure time during 
this review was changed from about 3’’-6” to one or more 1/10’’ exposures 
in order to give more practice with short exposures. For Group III the 
review exposure time was changed to 1” at this time. The number of 
presentations in this review was determined by the number needed to 
equalize the total number of presentations for each group for the entire 
course, e.g., Group IV was given fewer since more planes were covered 
in each session. Approximately 3 minutes were then devoted to the 
teaching of each new plane. The head-on, plan, and side diagrams were 
viewed in succession, and good recognition features were emphasized by 
the instructor. After each view, reminders were given to attempt a vis- 
ualization of the plane from all angles, to avoid irrelevant cues, and to 
make outline sketches while learning the plane. In the early part of the 
course, some suggestions for efficient study had been given and these were 
frequently reiterated. In addition, an interest item was mentioned for 
each plane, such as its use, or an example of its outstanding performance. 
The new planes were then reviewed on time exposure while the class 
called out the code number and name. Exposures were then speeded 
up to about 1/5” in the latter part of this review. For Group III, how- 
ever, no exposures less than 1” were given. 

Following the 4 week training period, for which the instructional 
procedures have been described above, was a 2 week test and “training- 
II” period. During these 2 weeks training was continued except that 
now all classes were given identical tests and trained to the conditions 
of Control Group I, i.e., all groups were given 1/50’ training, and picture 
as well as diagram views were included. In this 2 week period, 3 main 
training-II tests were given: Test 10, Test 11, and the Final Test. These 
tests will be described later in the section called “Tests of Recognition 
Performance during Training.” Other tests were also given in other 
sessions of the training-II period, which will be referred to as “‘post- 
tests” in contradistinction to the “pre- .’ These post-tests were 
either repetitions of the pre-tests or tests of other aspects of the learning 
product. 

To achieve uniformity in the use of study materials, all students were 


5 See paragraph following. 
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supplied with two standard aircraft recognition books (6, 7) and required 
to avoid any other material. A questionnaire on study methods filled 
in by the students near the termination of the training period shows that 
in general these books were the only material used. 


Apparatus 


The 8 subjects in each group were seated about 5 feet from a projec- 
tion screen. A constant seating position was maintained for each sub- 
ject. The arrangement of seats was such that 4 subjects sat slightly 
out of the center line. This probably did not influence the pre- and 
post-tests or the rate of learning, since an analysis by seating position 
shows no differences related to seating. The experimenter operated a 
Baloptican, with an attachment supporting an iris-diaphragm shutter. 
This apparatus was placed immediately behind the students. Stimulus 
cards were fitted into a specially constructed postcard holder ® which 
permitted ready changing of cards—in as short a time as 4 seconds. 
When the ready signal was given, the subjects fixated the center of the 
screen, and immediately after, the exposure was released by pressing the 
shutter-plunger. 

Photographic calibration of the shutter was performed twice before 
and once after the experiment. The shutter speeds called 1/50” and 1” 
in this experiment gave calibration values of 1/40” and 4/5” respectively. 
Percentage changes from before to after the experiment were less than 6%. 

Two sets of screen illumination readings were taken with a Macbeth 
Illuminometer. One set was taken with the room in semi-darkened 
condition and the other with the shutter opened on time exposure (with 
a blank card in the Baloptican). In the semi-darkened condition the 
screen reflected evenly 0.28 foot candles from a 15 Watt gooseneck lamp 
in the center of the room. The subjects recorded their responses to tests 
in this low light. The screen reflected evenly 2.55 foot candles of light 
when the shutter was open on time exposure. This was the illumination 
reflected from the screen when a plane was shown. 

Twenty-five American and 11 British planes were selected for the 
syllabus because of their frequent use in combat. 


Comparison of Group Scores 


What is the effect of each of the experimental variables on the learning 
of each group? The complete data for training-I and training-II tests 


* This postcard holder was constructed with a thin slot through which the stimulus 
card could be slipped in place for projection without the necessity of removing the post- 
card holder from its position beneath the Baloptican to change stimulus cards. 
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are given in Figure 1. The actual per cent correct scores for 5 of these 
tests which are especially important are given in Table 1A. The first 
of these, Average of Tests 1 and 2, may indicate the amount of the 
differences existing between groups before training. Test 9 indicates 
the effect of the experimental variables when tested under the conditions 
of these experimental variables. Test 10 is the most important test of all 
since it is given to all groups under identical conditions immediately after 
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Fig. 1. Per cent correct scores on 9 training-I tests and 3 training-II tests. (See 


text.) The 4 upper score curves for training-II tests were given at 1”’ exposure and 
the lower at 1/50. 


completion of training-I and reveals most clearly the differential effects 
produced by the different training procedures. Test 11 and the Final 
Test are important in showing the consistency of the differences, but 
measure only residual effects of the training procedures. In the more 
detailed presentation to follow, the scores on Test 10 are italicized to 
emphasize their importance. 

1. Limitation in Number of Views: Group I vs. Group II. A com- 
parison of scores for Group I (the 1/50’’ exposure group) and Group II 
(the limited-view group) indicates a small but consistent inferiority in 
Group II. Scores (per cents correct) on Average of Tests 1 and 2 and 
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Table 1A 


Per Cent Correct Scores on Two Training-I and Three Training-II Tests 








Training-I Tests Training-II Tests 
Av. of Tests 
land2 Test9 Test 10 Test 11 Final Test 
Exposure 
Time 1” 1/50” 1/50” 1” 1/50” 1” 1/50” 1//*** 
Group 





I 75 51 55 71 d 73 50 .80 
II 75 65 A5 67 ‘ .67 53 .68 
Ill 81 .82* 59 87 85 61 83 
IV 80 62 .66** .83 86 .62 86 





* Group III was given 1” exposure. 

** Two weeks had elapsed between completion of syllabus and this test because of 
the accelerated schedule for introducing new planes. 

*** This test may be somewhat less reliable than the others since scores are based on 
a test of only ten planes. 


Table 1B | Table 1C 


Per Cent Correct of Picture and Diagram Views Recognition Time for Planes 
for each 1/50’ Training-II Test (recorded on the planes in 
Test 10 in 1/60” units) 








Test 10 Test 11 Final Test Recog. Time 
Group Pic. Diag. Pic. Diag. Pic. Diag. Group 


I 50 .60 55 .64 48 52 I 69.7 
II 28 «63 52 .63 48 .58 II 56.9 
Ill 52 .67 65 .62 58 .64 III 54.3 
IV 58 .74 66 .68 61 .63 IV 51.1 








the three 1/50” training-II tests are respectively as follows: Group I, 
75, 56, 60, and 50%. Group II, 75, 45, 50 and 53%. The reliability 
coefficients of the differences between groups as indicated by ¢ ratios are 
all greater than the 5% level of significance. (See Table 2.) For the 
1” training-II tests, however, scores for Group I are somewhat better 
than for Group II. The reliability coefficients of the differences are 
again above the 5% level, but all favor Group I. 

Analysis of the percentages of picture vs. diagram views recognized 
correctly on Test 10, Test 11, and Final Tests (see Table 1B) reveals the 
source of Group II’s low scores as due to poorer performance on picture 
as compared with diagram views. In Test 10, scores for diagram views 
are almost the same whereas for picture views the scores are 50% vs. 
28% for Groups I and II respectively. Of course, the particular picture 
views used were completely new to both groups, the past difference in 
training being that Group I (and other groups) had had practice with 
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other views and Group II had had only the three stereotyped diagram 
views. 

2. Exposure Time: Group I vs. Group III. Group III (the 1” 
exposure group) appears to be somewhat. superior to Group I. On 
Average of Tests 1 and 2 and the three 1/50” training-iI tests, scores 
for Groups I and III are 75, 55, 60 and 50% and 81, 59, 64, and 61% 
respectively. The corresponding ¢ ratios are statistically unreliable 
but all favor Group III. 

Another aspect of the relative efficiency of performance of Groups I 
and III can be measured in terms of recognition time. This comparison 
should be important because proponents of the use of 1/50’’ exposures in 
training may maintain that there are other advantages which do not 
emerge from comparisons of per cent correct scores, especially the im- 
portant advantage of more rapid recognition. In Test 10, given immedi- 
ately after completion of training, subjects were required to indicate 
recognition by lifting their forefinger from a response key in circuit with 
a standard timer. The name of the plane, given verbally by the subject, 
was recorded by an assistant. The results in Table 1C show conclusively 
that an advantage for Group I in recognition speed did not exist. Fur- 
thermore, accuracy and speed of recognition in Test 10 for all subjects 
proved to be uncorrelated (r = — .16). 

3. Rate of Introduction of New Planes: Group I vs. Group IV. Group 
IV (the rapid-presentation-plus-review group) is superior to Group I. 
Scores on Average of Tests 1 and 2 and the three 1/50” training-II tests 
were 80, 66, 67, and 62%, as against corresponding values of 75, 55, 60, 
and 50% for Group I. The ¢ ratios for the differences between Groups 
I and IV on training-II tests are 1.83, 1.31, and 2.29. Although only 
the last of these is significant at the 5% level, the direction of all of them 
favors Group IV. 

A second, and obviously related aspect of the results concerns the 
intercomparison of all groups. Groups III and IV give the best results 
of all 4 groups. After completion of the 4 weeks training period, it was 
found that Group III did as well as both Groups I and II at 1/50” 
exposures (Test 10) even though no previous training on planes at 1/50” 
had been given. Furthermore, Group III did somewhat better on this 
test than any other group with 1”’ exposures. Group IV seems to be 
slightly better on the Final Test at both 1/50” and 1” than any other 
group. The ¢ ratios for the differences between Groups II and III on 
the three 1/50” training-II tests are 1.41, 0.83, and 1.25, and for the 
Groups II and IV they are 2.49, 1.59, and 1.95. 

Final evaluation of the differences in this section depends upon the 
reliability of the differences and the comparability of groups which are 
discussed below. 
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Reliability of Differences 


How much confidence can be placed in the obtained differences? 
Since the number of subjects in each group is small and the absolute 
values of many of the differences not particularly great careful consider- 
ation must be given this question. 


Table 2 


Reliability of Differences Between all Groups on Average of Tests 1 and 2 and 
Training-II Tests (¢ ratios) * 








Av. Tests 
land2 Test *9 Test 11 
Group 1 had 1/50” 1” 1/50” 1” 





0.00 1.20 0.33 0.23 
0.87 0.51 1.60 0.58 


IV 0.73 1.83 ; 1.31 


II , 
a 1.01 ’ 0.83 2.49 1.50 


II 
1¥ 0.79 2.49 1.57 ; 2.12 


9 0.14 0.80 0.82 020 0.53 





* If underlined the difference is within the 5% level of significance, i.e., greater than 
2.17 for those differences involving Group I and greater than 2.14 for others. 


Table 2 presents. the ¢ ratios for the differences emerging from inter- 
comparison of the performance of all groups on the Average of Tests 
1 and 2 and the 3 crucial training-II tests for 1/50’’ and 1’’ exposures. 

Although these reliability measures for the most part are not sta- 
tistically significant, examination of Figure 1 will reveal that the results 
show a consistent direction. It is this consistency of direction which 
suggests that the main differences, i.e., those between the first 2 and the 
second 2 groups, may be real ones. 


Comparability of Groups 


More important than statistical reliability in the case of such small 
groups is the question of the rigor of equation of all variables other than 
the experimental ones. The groups were relatively satisfactorily equated 
before training for 2 factors with evident importance for trainability— 
intelligence and previous knowledge of planes. The results of this 
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equation will be presented in Table 3 together with the results of tests 
of the other factors which might have been responsible for differences in 
the trainability of the groups. In the following list of these factors, 
the name of the measure alone gives some suggestion as to the nature of 
the measure. A complete description of each will be given in another 
report (4). The list is: 1. Intelligence (A. C. E.);2. Previous Knowledge 
of Planes; 3. Grades in Academic Work for Previous Semester; 4. Home 
Study Time for Aircraft Recognition; 5. Acuity (Snellen Symbol E 
Chart); 6. Interest I; 7. Interest II (Improvement); 8. Reaction Time 
(Light Flash; Common Objects; Airplanes); 9. Span of Apprehension, 
1/50”; 10. Memory-Location, 1/50’; 11. Memory-Complex Figures, 
Part 1 1/50’’; 12. Memory-Complex Figures, Part 2 1/50’; 13. Memory- 
Complex Figures, 5’; and 14. Memory-Aircraft Recognition Aspects, 
1/50”. ; 

Although the differences between groups on the above tests (see 
Table 2) appear small enough to disregard if taken singly, Group I and 
to a lesser extent Group II, are lower on several tests (Memory-Complex 
Figures 5’’ and Memory-Complex Figures, Part I, 1/50’) which have 
been found (4) to be important in determining final level of performance. 
Since the effect of these differences is not determinable, the differences 
between groups in aircraft recognition tests must be somewhat larger 
to be meaningful than they otherwise would have to be. This will 
primarily affect the conclusions based upon Group I vs. Group III. 

Other differences not covered by the above measures may have 
existed. For example, the differences in group scores obtained on 
Average of Tests 1 and 2, ranging from 0% to 6%, possibly were not 
produced by the experimental variables in training procedure, although 
these variables may have slightly affected the scores of Group IV and 
Group II. This may be a crucial criticism. However, a reanalysis of 
the data on the basis of a re-equation of groups shows that it does not 
materially affect the results. The re-equation was carried out in the 
following manner: Average of Tests 1 and 2 scores of 4 subjects, one from 
each group, were selected on the basis of how nearly they were alike. It 
was possible to find one low, high, low-middle, and high-middle score 
which was almost the same in each group. The means of these 4 sets 
of scores, one for each group, did not differ more than 0.5%. These re- 
equated group-means on Average of Tests 1 and 2 were then compared 
with the mean of the same 4 subjects on Test 10, Test 11, and the Final 
Test. Virtually the same relative differences among the scores were 
obtained as with the original data. The re-equated Group IV, however, 
was slightly higher on the Final Test. 
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Discussion 


A number of suggestions for optimum training procedure may be 
immediately inferred from our results: 

1. Group II procedure seems to be the least efficient. Therefore 
teaching materials which are restricted to only 3 diagram views of each 
plane, as are those of many spotters’ courses, are disadvantageous for 
proper learning, and cause difficulty in recognizing other, more life-like 
views.’ 

2. A training procedure including tests at exposures varying from 
1” to 1/50” has no evident advantages over a procedure having no 
exposures shorter than 1” (Group I vs. Group III). The slight apparent 
superiority of Group III on aircraft recognition tests is difficult to evaluate 
because of the small initial superiority of this group on Average of Tests 
1 and 2 and on some of the pre-tests which are correlated highly with 
final level of performance. There are three possible interpretations to 
the failure of Group I to show any superiority over Group III on Test 10 
which constituted the first experience of this group with 1/50” exposures. 
There may, in fact, actually be no special skill for seeing under short 
exposure conditions. The development of such a skill may not readily 
occur in the absence of conditions permitting operation of the law of 
effect, that is, more immediate identification of the plane verbally or by 
re-exposure for a longer time. Or lastly, a longer period of training with 
1/50’ exposures, as is usually found in recognition programs, may be 
necessary. 

The data presented in the next article (4) would enable a more com- 
plete equation of groups than was possible before this experiment and 
would open the way to a definite answer to the problem. 

Since the speeds tested in the present experiment are within the range 
considered short exposure, they offer no direct implications as to the 
relative advantages of short vs. long exposures. (Long exposures are 
here considered as 3 to 10 seconds.) Possibly some combination of long 
and short exposures may be more efficient than predominant use of either. 
This technique is employed in some recognition schools. In addition, 
more consistent and definitive use was made of the 1”’ and 1/50” exposures 
than is usual in current practice. Consequently, this afforded clearer 
insight into the effects of these procedures. 

3. New planes can be learned at almost twice the usual rate with no 
impairment in efficiency, in comparable length of time to the other 
teaching procedures, by the rapid-presentation-plus-review procedure. 

7In addition, there seems to be a greater tendency to use inadequate and trick 
recognition features. At the end of the course students were asked to confess any 
trick cues they had been using. Group II seemed to excel in their use. 
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There is strongly suggestive evidence that it is the most effective of the 
procedures used. Its superiority may be the result of a longer period 
of time during which all planes have been seen and a longer period of 
time for stressing those planes which present the most difficult differentia- 
tion problems. 

Since the procedures for Group III and IV were found to be most 
efficient, a combination of both might give greater efficiency than either 
one separately. This is one interesting possibility for further research. 


Summary and Conclusions 


Four equated groups, of 8 pre-aviation (V-5) students each, were 
taught aircraft recognition in a standardized manner. The major 
experimental variable in each group was as follows: Group I, 1/50” 
exposure time; Group II, only three views of each plane; Group III, 1” 
exposure time; and Group IV, presentation of the entire syllabus in 
almost half the usual time, followed by a review emphasizing confused 
planes. 

The following conclusions were obtained. 


1. The use of 1/50” exposures as part of training in which an ap- 
proximately equal number of longer exposures are given has no 
ascertainable advantages over 1’’ exposures similarly given with 
longer exposures. 

. Restriction of teaching materials to only three views of each plane 
results in learning which generalizes poorly to views of planes 
other than those taught and is, in this sense, inefficient. 

. Rapid teaching followed by review of confused planes is probably 
the most efficient of the procedures tested. 


These findings might be useful for incorporation in actual training 
courses. 


The battery of tests which was found (4) to be important for deter- 
mining the final level of performance will now make possible a more 
complete equation of groups and therefore future experiments which can 
yield more definite answers to problems such as the above. 


Received October 18, 1944. 
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Magazine vs. Personal Interview Votes in the Consumer 
Jury Advertising Test 


Lester Guest 
The Pennsylvania State College 


For many years the consumer jury or opinion method of testing the 
effectiveness of advertisements has been a useful device for estimating 
the relative effectiveness of advertisements before they appear. In 
essence, respondents are usually personally interviewed and asked which 
of several advertisements interests them the most or would most likely 
influence them to buy. 

The completeness of the make-up of the advertisements at the time of 
the test, the number of advertisements that can be judged at one time, 
and the form of the question posed the respondent have all been subjected 
to research (1) (3).!_ It has been conceded that, as in all sampling studies, 
the sample should be representative of the purchasers or potential 
purchasers of the product in question. The final answers to some of the 
other problems await some sort of definitive criterion of effectiveness. 
However, assuming that respondents’ stated preferences are automatically 
valid for the degree to which they would be likely to read an advertise- 
ment, some questions can be tentatively answered. 

In regard to the form of the question, the consensus seems to indicate 
that respondents should be asked to select the advertisement which 
appeals to them the most, or that interests them the most, or the adver- 
tisement that would most likely lead them to buy, rather than asking 
them to select the best advertisement (2, p. 369) (3, p. 124). (There are 
those who say that the wording makes little, if any, difference (1, p. 18).) 
The reason for this distinction in question wording arises from the belief 
that the interviewee should be asked to react to the advertisement as 
a consumer and not as a critic of advertising. From the current point of 
view, there is no best advertisement apart from the reader’s own prefer- 
ence. From a practical point of view, it makes no difference whether 
the advertising expert believes that a certain advertisement has the best 
headline, general layout, and illustration, if the public concerned dislikes 
that advertisement and prefers another. Therefore, the question asked 
the respondent should not lead the respondent to believe that he is 


1 Unpublished data from several sources also give information on these points. 
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matching wits with experts, but should let him feel free to react as he 
usually acts, not as he should act. 

With respect to sampling problems, it has been shown that coupon or 
mail responses are not usually representative of the general population 
or of a specific mailed sampling. This is sometimes due to dilution or 
inflation caused by the habitual coupon clipper, or even more discon- 
certing, the inertia of the temporarily disinterested individual or the over- 
zealous individual who has a special interest in the problem at hand. In 
addition, Link (3) points out that mail inquiries allow respondents too 
much time for critical evaluation which may lead to distortion. 

The author recently had an opportunity to study the following 
questions in respect to consumer jury responses: (1) what effect does the 
form of the question have upon preference for an advertisement, and (2) 
what differences, if any, result from a comparison of magazine ballots 
vs. personal interview ballots? 


Procedure 


The advertisements in question were printed in the interests of a well 
known, nationally distributed drug product. The original advertise- 
ment in reality consisted of two advertisements placed side by side on a 
full page of a national magazine. The headline of the complete ad- 
vertisement asked the reader to indicate which ad he or she voted for, 
with a small subscript challenging the reader to “‘match wits with the 
experts.” The two advertisements then appeared, followed by a short 
bit of copy and a coupon. In return for the reader’s reply, he was to 
receive a free sample of the product. These two advertisements will be 
referred to as A and B. Advertisement B subsequently appeared singly 
in three other national magazines, but Advertisement A never appeared 
elsewhere. 

The double advertisement asked the reader which advertisement 
was the better. In order that coupon replies and personal interview 
replies be comparable it was imperative that the personal interview 
question also ask which advertisement was better. However, it was 
thought desirable to check the influence of wording the question in this 
fashion with wording believed to be more appropriate, i.e. “‘most inter- 
esting.”’ Therefore, the questionnaire was constructed with this aim 
in view. 

Two forms of the questionnaire were constructed, each asking for 
data concerned with usage of the product, recognition of the advertise- 
ments, and readership of the magazine in which the advertisements 
originally appeared. In addition, each person was asked which ad- 
vertisement. interested him more and which advertisement he thought 





Magazine vs. Personal Interview Votes 401 


was better. On Form A of the questionnaire, the “interest”? question 
preceded the “‘better’’ question, and on Form B the order of these two 
questions was reversed. Other than this, the two forms were identical. 
Additional data were secured bearing upon the respondent’s age, sex, and 
economic status. 

Proofs of the original advertisement were obtained and by appropriate 
cutting were made into two separate advertisements, each as they might 
appear singly. These were presented to the respondent simultaneously, 
but the right-left position was systematically varied to avoid any time- 
order error. In the original magazine appearance this was of course 
impossible. 

A total of 304 interviews were conducted by experienced interviewers. 
These were done about 2 months after the appearance of the original 
advertisement and were stratified into the conventional A, B, C, and D 
economic groups. Half of the interviews were done with each form of the 
questionnaire, and half were done with each sex. No attempt was made 
to control the age distribution, although the approximate age was noted 
for each interviewee. Geographically, the interviews were done in cities 
and small towns distributed along the eastern coast. 

Several internal checks of the data indicate that the sample was 
reasonably representative. For example, although no effort was made 
to interview the same number of men and women with each form of the 
questionnaire, about one-half men and one-half women were found to 
have been interviewed with each form. Similarly, although economic 
status for the whole 304 interviews was stratified, no attempt was made 
to stratify within one questionnaire form. However, approximately the 
correct proportions were maintained by form of the questionnaire. 
Finally, about the same number of users of some brand of the product 
were found to have been interviewed with each form of the questionnaire. 


Results 


The data presented in Tables 1 and 2 refer to the material collected 
from all interviews irrespective of the form of the questionnaire used. 
Cases where respondents refused to choose between the two advertise- 
ments have been eliminated throughout to facilitate comparisons. 
These constituted only 10% to 12% of the cases and their elimination 
did not materially change the results. The category “interesting” 
always will refer to the question phrased using that word, and the category 
“better” will refer to the question using the word better. Table 1 indi- 
cates that in all comparisons Ad A was judged superior regardless of the 
question asked or the way the data were collected. A’s superiority 











Lester Guest 














Table 1 
Per Cent for each Advertisement in Total Interview Group and Coupon Group 
Ad A Ad B N 
Personal Interview 
Interesting 53% 47% 266 
Better 58% 42% 274 
Coupon 
Better 59% 41% 239 
Table 2 


Per Cent for each Advertisement According to Economic Status, Sex, and 
Use of some Brand of the Product * 











More Interesting Better 

AdA AdB N AdA AdB N 

A 59% 41% 29 55% 45% 29 

Economic B 51% 49% 53 60% 40% 57 

Status Cc 52% 48% 104 57% 43% 107 

D 54% 46% 80 57% 43% 81 

Sex Male 55% 45% 125 60% 40% 131 

Female 51% 49% 140 55% 45% 142 

Use of some Users 50% 50% 103 54% 46% 105 

brand of the Non-users 55% 45% 163 60% 40% 169 
product 





* Only 3% of the total group of respondents used the brand of the product advertised 
and therefore this group was considered too small for any statistical analysis. 


increases when “‘better” was asked in the personal interview and agrees 
closely with coupon returns in this case. 

In no case, however, did the critical ratios between these differences 
reach 3. The difference of 18% in favor of Ad A in terms of coupon 
returns yields a critical ratio of 2.83, or over 99 chances in 100 that the 
difference is not the result of sampling errors. Likewise, the difference 
of 16 in favor of Ad A from the personal interview returns when ‘“‘better”’ 
was asked gives 99 chances in 100 that the difference is not the result 
of sampling errors. 

When the data are fractionated for economic status, sex, and use of 
some brand of the product, the base N’s become too small to yield any 
statistically significant differences. However, such breakdowns can give 
suggestions as to the possible groups from which coupons were returned 
in order that Ad A receive 59% of the votes. Therefore, these data are 
presented in Table 2. 1 
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Some of these percentages agree quite closely with coupon returns, 
for example, the percentage of men picking Ad A as better, the non-users 
picking Ad A as better, and the B economic group selecting Ad A as 
better. From this, it looks as if coupons might have been returned with 
a greater frequency from men, the B economic group, and non-users 
of the product, otherwise coupon returns would not agree as well with 
personal interview data. As a matter of fact, it is known that coupons 
were returned with men’s signatures more frequently than with women’s 
signatures when the coupon returns were broken down for sex, 60% of 
men prefer Ad A and 55% of women prefer the same advertisement. 
This agrees perfectly with the interview data for sex on the same form 
of the question. The coupons did not contain data allowing other 
fractionations. 

It will be recalled that on one form of the questionnaire, “interest” 
was asked first followed by “better” and that this procedure was reversed 
on the other form. The previous tables present the data grouped by 
question form but summating responses irrespective of the order of the 
questions. To make the most legitimate comparison with coupon re- 
turns it is necessary to consider only responses to the question asking 
which advertisement is better on the personal interview questionnaire, 
unbiased by a previous question asking which advertisement is more 


interesting. To do this, responses were tabulated separately for Form 
B of the questionnaire and only for the question asking for the “better” 
advertisement. This question appeared first on this form. Table 3 
gives these results. 


Table 3 


Per Cent for each Advertisement According to Economic Status, Sex, and Use of the 
Product for the “Better’’ Question Only and When it Appeared First (Form B) 








Total Groups 
Ad A Ad B N 


Coupon (Better) 59% 41% 239 
Personal Interview (Better) 53% 47% 135 








Fractionations for Personal Interview 
Ad A Ad B 


50% 50% 

60% 40% 

58% 42% 

39% 61% 

55% 45% 

Sex 51% 49% 
Use of some brand 51% 49% 
of the product 54% 46% 
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Here again, the N’s are much too small to yield any statistically 
significant differences but the results shown here suggest that coupons 
probably were returned most heavily from the B economic group in order 
that 59% of magazine voters could have selected Ad A. On the whole, 
it is probable that coupon responses overestimate the “superiority” of 
Ad A even in terms of “‘betterness.”” (The critical ratio of the difference 
of 18% in favor of Ad A yielded by magazine votes is 2.83 whereas the 
critical ratio of the difference of 6% in favor of Ad A yielded by personal 
interview results is .70.) 

Another factor that could have led to the large majority of coupon 
responders selecting Ad A was the fact that Ad A always appeared on the 
left position in the magazine whereas the right-left position was alternated 
in the personal interview. Other studies have shown the influence of 
time-order error and it is conceivable that it operated in this instance (1). 

It is obvious that differences arise not only from the form of the 
question but also from the effect early questions in the questionnaire 
have upon other subsequent questions. Table 4 presents material upon 
this factor. 











Table 4 
Influence of the Order of Questions upon Preference for an Advertisement 
Form A Form B 
A B N A B N 
Interest (1st) 55% 45% 132 Better (ist) 53% 47% 135 
Better (2nd) 63% 37% 139 Interest (2nd) 51% 49% 134 





The material in this table shows again that Ad A is superior in every 
comparison, but the degree of superiority varies depending upon the 
context of the question. It seems that from a pure interest point of 
view, Ad A is picked by 10% more people than Ad B, but when followed 
by a question referring to “better,” Ad A’s superiority jumps to 26%. 
(The critical ratio of the latter difference is 3.17. In all other com- 
parisons in Table 4 the critical ratio approached zero.) On the other 
hand, if “‘better’’ is asked first, followed by “‘interest,”’ the change is only 
from 6% to 2% in favor of Ad A. It looks here as if, once a person has 
said that an advertisement is better, there is less likelihood of his turning 
about-face and choosing another advertisement as more interesting, but 
if he has picked an advertisement as more interesting, then is confronted 
with a question inferring that there is such a thing as a “better” advertise- 
ment, he may be more prone to change his choice. The data presented 
in Table 5 indicate that the large majority of people do not change their 
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original selection, but that those who do change materially alter results 
for the total group. 

Out of 152 answers to Form A, 107 made no change, and out of 152 
answers to Form B, 109 persons made no change. Therefore, the ap- 
pearance of either question before the other is not especially conducive 
to change of choice. However, considering only those that changed 
preference, when better precedes interest, the changes were about evenly 
split, whereas, when interest precedes better, twice as many persons 
change to Ad A as change to Ad B. (The critical ratio in the latter 











Table 5 
Major Changes in Choice of Advertisement by Form of the Questionnaire 
Form A Form B 
(Interest—better) (Better—interest) 
Changes to Changes to Changes to Changes to 
B when asked A when asked N B when asked A when asked N 
“better” “better’’ “interest” “interest” 
33% 66% 45 49% 51% 





instance with only 45 cases is 2.37.) Evidently, there are some factors 
in Ad A that respondents who originally select B as interesting feel 
experts agree upon as better. What those factors are is not obvious 
from the present analysis. 

An interesting commentary concerning the whole study is that al- 
though Ad A always gathered the majority of the votes (sometimes 
small), Ad A was never published as a separate advertisement, whereas 
Ad B ran separately three times. This supports the contention that the 
copy writer, although he may be an excellent judge of an advertisement 
for himself or a small number of people like him, may not estimate in 
advance which of several advertisements will please the majority of the 
reading public with which he is really most concerned (3). 


Conclusions 


Any conclusions drawn must of necessity be interpreted in the light 
of the fact that, due to the relatively small number of cases interviewed, 
most of the differences do not yield critical ratios of 3 or more. However, 
in several of the more pertinent comparisons, the chances are greater 
than 90 in 100 that the obtained differences are not due to sampling 
errors. Remembering the limitations of the study, the following con- 
clusions may be drawn. 
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1. A comparison of the results of a consumer jury test carried out by 
magazine votes with one carried on by personal interview indicates that 
the two give different results. Magazine returns indicated 59% for 
Ad A whereas the most legitimate comparison figure for the personal 
interview group was 53% for Ad A. The critical ratio of the difference 
is 1.12, or 87 chances in 100 that the difference cannot be explained in 
terms of sampling errors. At least one other study has shown a similar 
discrepancy (3, p. 119). 

2. The differences found between magazine and personal interview 
results may be a result of unrepresentative sampling of coupon responses. 
It appears that the middle economic groups’ responses more nearly 
approximate coupon returns than other groupings. The magazine in 
which the original advertisement appeared would tend to be read by the 
top economic groups more than the lower groups. 

3. There is a possibility that a time-order error may be introduced 
in magazine voting and that this can be balanced out by a properly 
controlled personal interview study. 

4. The form of the question has an important bearing upon results 
obtained. Different responses are obtained when a person is asked to 
select the better of two advertisements than when he is asked to choose 
the one that interests him the most. It is likely that the latter form 
will give a truer picture of him as a consumer than the former form. 

5. For some people, the answers to a question will be unduly influenced 
by preceding questions and answers. 


Received September 5, 1944. 
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Book Reviews 


Woolpert, E. D. [Ed.] Municipal Personnel Administration. (3rd 
edition.) Chicago: International City Managers’ Assoc., 1942. Pp. 
xii + 429. $7.50. 


Municipalities and other jurisdictions entertaining the idea of installing a personnel 
system, contemplating the institution of an in-service training program or seeking a 
comprehensive, realistic guide to workable solutions of personnel problems should 
consult Municipal Personnel Administration. This book is published by the Interna- 
tional City Managers’ Association as the ninth volume of its series on municipal 
administration. 

While public officials are inclined to be interested in specific problems, rather than 
general practices and trends, they should be ever mindful of the need for perspective 
and long-range policies and programs. Accordingly, this volume consists of more than 
expedient devices for the practitioner. The first chapter on the personnel problem 
does just this by providing a broad background and a setting for the more specific 
discussions of personnel problems which encompass the remainder of the volume. 

The need for a rational organization of personnel activities is the first specific per- 
sonnel problem considered. It is indicated that the dimensions and complexity of this 
problem require positive, concerted effort for effective solution. A basic requirement 
for dealing adequately with this personnel problem is to have defined clearly the respon- 
sibility for the administration of such activities. This personnel problem exists even 
in the smaller cities where it is not always feasible to have a separate personnel agency. 
However, it is pointed out that these cities must recognize regardless of working force 
the existence of such problems as classification, rates of pay, recruitment, hours of work, 
attendance, and leaves of absence. The text outlines several ways whereby these 
municipalities can meet their personnel problems. The dominant theme is that varia- 
tions in structural patterns of personnel organization should not be taken seriously as 
long as they result from the application of sound general principles to different local 
situations. The basic problem of organization is solved when the work is analyzed and 
the methods and resources available are considered and evaluated. 

In the ensuing several chapters where the principal phases of personnel administra- 
tion are considered, an attempt has been made to outline standards of administration as 
guideposts for individual personnel agencies. In the chapter on position-classification, 
it is shown that information relative to the duties and responsibilities attached to the 
various individual positions within a given service is of primary importance in the 
development of a personnel program. Accordingly a position-classification plan de- 
signed to secure and utilize these facts should be one of the first aims in preparing and 
administering a personnel program. After stating the nature and objectives of position- 
classification, serious attention is given to the development of a classification plan, the 
problems entailed in introducing the plan, and the basic requisite of continuous adminis- 
tration of the plan. 

In a concise, yet complete, chapter on salary and wage administration, it is shown 
that the intangible advantages of public employment about balance the disadvantages. 
A workable outline is presented for those seeking to attain the principal objective of a 
pay plan—equal pay for equal work. 
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In Chapter 5 the reader finds an objective discussion of problems involved in the 
recruitment and selection of qualified personnel for public service. After indicating 
the relation of recruitment to classification, there is presented the various forms and 
methods of examining or testing applicants in selecting employees with a capacity to 
learn. In logical sequence there follows a chapter on employee training in which there 
is found most of the elements entailed in the organization for and methods of training. 
The point is made that the appropriate method of training should be determined prag- 
matically, on the basis of utility and applicability. In Chapter 7, considerable attention 
is given to the methods of appraising, through promotional test, the capacity of em- 
ployees for promotion. Personnel administrators are reminded that there is an inverse 
relationship between the need for specific knowledge, the factor which a well-constructed 
examination can measure reliably, and the level in the organizational hierarchy which 
an employee occupies. The higher one goes in the organizational ladder the greater 
should be the emphasis upon capacity for administration. 

The problem of reports of performance is treated in an unbiased manner. The 
evaluation of employees is regarded as an essential part of administration since it is 
quite common to find disparity between ability and performance. In any balanced 
personnel program determination of knowledge, skills, and aptitudes should be supple- 
mented by measures of employee performance. Various types of rating systems are 
described and an effort made to evaluate the degree of success found with each. The 
analytic checklist type seems to be favored because it tends to eliminate the weakness 
of evaluation by the rating officer prevalent in the graphic rating scales. 

What has been presented to this point is regarded as the skeleton of an effective 
personnel program. The chapter on morale and conditions of employment includes 
other elements which enter into such a program enabling it to give maximum service to 
the community. The quality of supervision within an organization is presented as the 
most important factor in building and maintaining high morale. Closely related to 
the question of morale is the complex matter of discipline. While favoring some form 
of disciplinary process, similar to the employee self-discipline plan inaugurated in the 
refuse collection division of the Los Angeles city government, responsible officials are 
cautioned against copying without discrimination this or any other plan as a general 
model. Since self-discipline can best be developed and maintained through the organi- 
zation of employees themselves, the following chapter on employee relations delves into 
the many ramifications of the personal equation in administration, particularly as they 
relate to the development of a municipal employee relation policy. As the final element 
of a comprehensive personnel program, Chapter 12 goes into the ramifications of a 
properly planned and administered retirement system. While the personnel adminis- 
trator does not usually administer the retirement system, it is advised that he participate 
in the planning phases because it plays an important part in the development and 
maintenance of effective government. 

After covering the broad background from recruitment to retirement, Chapter 13, 
entitled “Special Administrative Problems,” presents several aspects or phases of per- 
sonnel administration which have to do with the overall administration of a personnel 
program, rather than with the program per se. They include such items as personnel 
rules and regulations, personnel forms and records, research, measurement, and public 
relations. 

Practitioners, namely administrators, personnel psychologists, and heads of oper- 
ating departments and agencies, will find the book extremely valuable in developing 
workable solutions of personnel problems. The approach is primarily utilitarian, with 
emphasis upon day-to-day personnel problems which confront the aforementioned. 
The discussions in the book have achieved a high degree of realism through a process of 
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selecting carefully specific problems, techniques and procedures derived from the 
authors’ own experiences and observations. In all the discussions there is no pretense 
of presenting a detailed manual or a model system of municipal administration. 

The tone and outlook of the book in general is objective and practical. The object 
is not to suggest complete overhauling of personnel administration overnight where 
obvious disparity exists between accepted standards and practices in given organiza- 
tions. Instead, the personnel officer is made to realize that changes for the improvement 
of public service must be introduced by progressive steps. 

Progressive municipal officials and personnel psychologists can hardly afford to do 
without this volume designed as a practical guide. The general administrator will find 
it very useful in delineating the principal techniques derived by personnel experts and 
learning the relationship between these techniques and management problems confront- 
ing the chief administrator and his department heads. The personnel officer will find 
the text indispensable for two reasons: first, the principles and activities of a compre- 
hensive personnel program are developed from discussions of the personnel problem; 
and second, the position of personnel administration in the broader framework of ad- 
ministrative management is portrayed quite effectively. In addition, students of mu- 
nicipal administration, particularly those interested in public service careers, will find 
the book most profitable in obtaining a balanced picture of the interests and approaches 
of the administrator and the personnel officer. 


John K. McKay 
California State Personnel Board 


Melville, S. Donald. Color vision, Reprinted from The Optometric 
Weekly, August 24, August 31, September 7, and September 14, 1944, 
pp. 19. 


The author of this article has given a condensed version of the physiology and the 
psychology of color vision. In most respects the sources cited are up to date and ade- 
quately evaluated. However, the field is one in which so much activity is going on 
currently that one should be careful to accept any crystallization as final. The psy- 
chology of color is not given as a psychologist probably would give it, who is particularly 
interested in the field, but it touches upon most of the facts. 

The article works up to a final topic which is probably of considerable interest to 
readers of the Weekly, namely, abnormal color vision and tests for color vision. Most 
of the current tests are described briefly but no thorough-going evaluation is attempted. 
The only new material is derived from some tests by the author on the treatment of 
color blind individuals. His interpretation of these results is in line with other studies 
which give no indication that color blindness can be altered by dietary or training 
methods. A list of 89 references is appended, which, while not exhaustive, gives a good 
sampling of the field. Undoubtedly the article answers very well the purpose for which 
it was written, namely, that of informing a particular group of professional men con- 
cerning a field related to their profession. 


Forrest Lee Dimmick 
Hobart College, 


Geneva, New York 
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MacKintosh, J. M. The war and mental health in England. New York: 
The Commonwealth Fund, 1944. Pp. 91. $.85. 


This book consists of a series of informal essays on mental health in England during 
the first four years of the present World War and the outlook for the Post-War period. 
Part One entitled The impact of war covers The process of adjustment, 1939-40, The 
lonely year, 1940-41, Defense, preparation and alliance, 1941-42, and The end of the 
beginning, 1942-43. Part Two entitled Mobilization for peace covers Hospital services, 
Voluntary organizations for mental health, Professional education in mental health, and 
Some problems of the future. Much of the content consists of either anecdotal material 
or generalizations for which little supporting data are introduced. The general point 
of view that scientific services, mental health, education and propaganda can make a 
great contribution both to war and peace is one with which few will disagree. But 
the professional person, however much he may enjoy the style and manner of presenta- 
tion, may well feel that the book is somewhat vague and over-optimistic with regard 
to the contributions that can be made and quite lacking in concreteness and specificity 
as to their character. 


John E. Anderson 
University of Minnesota 
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Books, monographs, and pamphlets for listing and possible review should be 
sent to Donald G. Paterson, Editor, Department of Psychology, 
University of Minnesota, Minneapolis 14, Minnesota 


Psychology for the armed services. Edited by Edwin G. Boring. Washington, D. C.: 
The Infantry Journal, 1945. Pp. 544. $3.00. 

Joan chooses occupational therapy. Meta Cobb and Holland Hudson. New York: 
Dodd, Mead & Co., 1944. Pp. 214. $2.00. 

Methods of vocational guidance. Gertrude Forrester. New York 14: D. C. Heath & 
Co., 1944. Pp. 480. $3.00. 

Marriage and family counseling. Sidney E. Goldstein. New York: McGraw-Hill Book 
Co., 1945. Pp. 457. $3.50. 

Dictionary of education. Carter V. Good. New York: McGraw-Hill Book Co., 1945. 
Pp. 496. $4.00. 

Developmental psychology (revised edition). Florence L. Goodenough. New York: 
D. Appleton-Century Co., 1945. Pp. 723. $3.75. 

Job exploration workbook. Milton E. Hahn and Arthur H. Brayfield. Chicago: Science 
Research Associates. Pp. 95. $.96. 

Occupational laboratory manual. - Milton E. Hahn and Arthur H. Brayfield. Chicago: 
Science Research Associates. Pp. 29. $1.00. 

Guide to guidance. An annotated bibliography. Volume VII. M. Eunice Hilton. 
Syracuse: Syracuse University Press, 1945. Pp. 62. $1.00. 

Twenty careers of tomorrow. Darrell and Frances Huff. New York: McGraw-Hill Book 
Co., 1945. Pp. 281. $2.50. 

Mainsprings of civilization. Ellsworth Huntington. New York 16: John Wiley & 
Sons, Inc., 1945.' Pp. 660. $4.75. 

Mental disorders in later life. Edited by Oscar J. Kaplan. Stanford University: Stan- 
ford University Press, 1945. Pp. 436. $5.00. 

The governing of men. Alexander H. Leighton. Princeton: Princeton University Press, 
1945. Pp. 450. $3.75. 

The prediction of success for students in teacher education. Lycia O. Martin. New 
York: Bureau of Publications, Teachers College, Columbia University, 1945. Pp. 
120. $2.00. 

Unconsciousness. James G. Miller. New York: John Wiley & Sons, 1942. Pp. 329. 
$3.00. 

Prediction of the adjustment and academic performance of college students by a modification 
of the Rorschach method. Ruth L. Munroe. Stanford University: Stanford Uni- 
versity Press, 1945. Pp. 96. $1.25. 

Jobs for the physically handicapped. Louise Neuschutz. New York 16: Bernard Acker- 
man, Inc. Pp. 230. $3.00. 

Soldier to civilian. G. K. Pratt. New York: McGraw-Hill Book Co., 1945. Pp. 233. 
$2.50. 

Psychology of sex relations. Theodor Reik. New York: Farrar & Rinehart, Inc., 1945. 
Pp. 243. $3.00. 
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Intelligence and its deviations. Mandel Sherman. New York 10: Ronald Press Co., 
1945. Pp. 300. $3.75. 

Educational psychology revised. Edited by Charles E. Skinner. New York: Prentice- 
Hall, Inc., 1945. $3.75. 

Elementary educational psychology. Edited by Charles E. Skinner. New York: Pren- 
tice-Hall, Inc., 1945. $3.25. 

Sampling statistics and applications. J. G. Smith and A. J. Duncan. New York: 
McGraw-Hill Book Co., 1945. Pp. 492. $4.00. 

Vocational interest patterns. Irene Wightwick. New York: Bureau of Publications, 
Teachers College, Columbia University, 1945. (Contributions to Education No. 
900.) Pp. 231. $2.60. 

An analysis of the work of general clerical employees. New York: Teachers College, 
Columbia University, 1944. Pp. 100. (Contributions to Education No. 903.) 
Employment tests in industry and business. (A bibliography.) Princeton, N. J.: Indus- 

trial Relations Section, Princeton University, 1945. Pp. 46. $.50. 

Putting the disabled veteran back to work, II. Industrial Hygiene Foundation, Pitts- 
burgh, Pennsylvania. Pp. 33. $.25. (Part I of Proceedings of Ninth Annual 
Meeting of Industrial Hygiene Foundation of America, Inc., November 15-16, 
1944.) 

Rehabilitation—a plan to help you employ disabled veterans and other handicapped persons. 
American Mutual Alliance, 919 N. Michigan Ave., Chicago 11, 1944. Pp. 22. 
Free. 

You and the returning veteran. A guide for foremen. Allis-Chalmers Manufacturing 
Co., P. O. Box 512, Milwaukee 1, Wisconsin. Pp. 40. Free. 

















